Postgres App not working anymore

Gwaggli · April 22, 2024, 11:14am

Postgres App not working anymore out of nowhere.

2024-04-22T11:09:23.341 proxy[5683d922f4dd28] ams [error] [PP02] could not proxy TCP data to/from instance: failed to copy (direction=client->server, error=Transport endpoint is not connected (os error 107))

2024-04-22T11:09:48.233 app[5683d922f4dd28] ams [info] monitor | [WARN] Failed to restart haproxy on member fdaa:2:1180:a7b:39:8e69:7332:2: Get "http://[fdaa:2:1180:a7b:39:8e69:7332:2]:5500/commands/admin/haproxy/restart": dial tcp [fdaa:2:1180:a7b:39:8e69:7332:2]:5500: i/o timeout

2024-04-22T11:09:48.233 app[5683d922f4dd28] ams [info] monitor | clusterStateMonitorTick failed with: primary has been quarantined: unable to confirm we are the true primary

2024-04-22T11:09:49.151 app[5683d922f4dd28] ams [info] failed post-init: unrecoverable zombie. Retrying...

2024-04-22T11:09:49.151 app[5683d922f4dd28] ams [info] [ERROR] Manual intervention required.

2024-04-22T11:09:49.151 app[5683d922f4dd28] ams [info] [ERROR] If a new primary has been established, consider adding a new replica with `fly machines clone <primary-machine-id>` and then remove this member.

When i connect to the app via ssh i can see a zombie.lock which resolves the issue for a couple of seconds. Afterwards its ending up in the same state again.
I think my db app crashed because it run out of memory but eventho i raised the memory limit (which was never full anyways) it is still not working. Any help much appreciated.

shaun · April 22, 2024, 7:25pm

Hey there,

I just took a look at your app and it looks like it’s currently in good shape. Do you still need assistance here?

As a side note, it looks like you have quite a few unused volumes tied to your app.

adrianoc · April 22, 2024, 10:53pm

I’m also receiving the same proxy error from my Fly cluster. I noticed that PP02 is not documented here: Fly.io Error Codes · Fly Docs

My own investigation suggests that PP02 occurs when pg clients connected to a Postgres cluster are forecefully disconnected by Fly’s postgres proxy after 30 minutes of idle time.

Clients that expect > 30 minutes of idle time should be resilient to server-side disconnects.

dist · April 22, 2024, 11:44pm

I am having a similar issue today, not entirely sure what’s causing this.

dist · April 23, 2024, 12:17am

When i clone the machine and have two instances it works, but as soon as i destroy one of them, it stops working again (I tried both scenarios of destroying the new replica as well as destroying the original master instance, in both cases same issue, as soon as the numbers goes down to one, it doesn’t work anymore). I’ve been running a single machine for 4 months now, this is the first time this is happening in this way

Gwaggli · April 24, 2024, 3:55am

Hey Shaun,
It suddenly started working again so no problem anymore. But still our app was down for 3hrs and I have no idea what caused it and what i could do to prevent it from happening again or at least how i would fix this. Any ideas?

Ps: thanks, i cleaned them up.

wobbleburger · April 24, 2024, 9:49pm

I am also seeing these errors:

2024-04-24T18:56:30.049 proxy[4d891972f279d8] sjc [error] [PP02] could not proxy TCP data to/from instance: failed to copy (direction=client->server, error=Connection reset by peer (os error 104))

This is occurring much more frequently today than in the past. These are not happening on idle clients

wobbleburger · April 25, 2024, 6:17am

Does anyone from fly.io have an explanation for these errors?

pavel · April 25, 2024, 9:19am

Hey @wobbleburger

We had an incident around that time which caused connectivity problems in multiple regions, including sjc: Fly.io Status - Elevated errors and connectivity problems

Looking at your app’s logs, the errors happened during that incident.

system · May 2, 2024, 9:20am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
[FAILURE] Postgres stopped working: failed to connect to proxy: context deadline exceeded Questions / Help postgres	7	1073	June 3, 2022
private network down. Questions / Help postgres	11	990	June 4, 2022
Communication with Postgres Cluster dead? postgres	3	46	July 24, 2024
postgres database not reachable	4	224	August 17, 2023
Unable to reach postgres instance postgres , proxy	31	355	October 24, 2024

Postgres App not working anymore

Related topics