Postgres cluster broken since last Fly migration

zaf · June 25, 2024, 11:52am

We have a Fly Postgres cluster with 3 nodes running in production. Our app has been broken today since it can no longer connect to a primary instance (all nodes became replicas). Our resource usage (CPU load, memory, disk) has been well below the limits.

I’m guessing this has to do with the automated PG migration that was done at Jun 25 2024 22:22 UTC?

VERSION STATUS          DESCRIPTION     USER            DATE (UTC)              DOCKER IMAGE
v3      complete        Release         john@fly.io     Jun 25 2024 22:26       docker-hub-mirror.fly.io/flyio/postgres-flex:15.3
v2      failed          Release         john@fly.io     Jun 25 2024 22:22       docker-hub-mirror.fly.io/flyio/postgres-flex:15.3
v1      complete        Release         zafer@algora.io Mar 10 2024 17:37       registry-1.docker.io/flyio/postgres-flex:15.3

I have attempted to force a failover but Fly rejects that with no active leader found. I’m also not able to connect to the database with the CLI or pg_dump with a Fly proxy anymore. I have even created a completely new Postgres cluster with 1 primary node by forking one of the existing volumes, but that didn’t work either.

Seems like an issue with repmgr connection:

2024-06-25 10:53:59.358
repmgrd  | Is the server running on that host and accepting TCP/IP connections?
2024-06-25 10:53:59.358
repmgrd  | connection to server at "****:*:****:***:***:****:****:*", port 5433 failed: Connection refused
2024-06-25 10:53:59.358
repmgrd  | [2024-06-25 10:53:59] [DETAIL]
2024-06-25 10:53:59.358
repmgrd  | [2024-06-25 10:53:59] [ERROR] connection to database failed
2024-06-25 10:53:59.358
repmgrd  | [2024-06-25 10:53:59] [INFO] connecting to database "host=****:*:****:***:***:****:****:* port=5433 user=repmgr dbname=repmgr connect_timeout=5"
2024-06-25 10:53:59.358
repmgrd  | [2024-06-25 10:53:59] [NOTICE] repmgrd (repmgrd 5.3.3) starting up

system · July 2, 2024, 11:52am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
DB server started, unreachable, rails app down Questions / Help postgres , rails	7	525	October 21, 2023
Unable to connect to my postgres instance Questions / Help postgres	6	590	February 22, 2023
Postgres is down, cannot restart. No active leader found postgres	22	5308	January 15, 2025
Fly Postgres create with FORK option stopped working postgres , volumes	6	208	April 17, 2024
Issues with Postgres connection timeout Questions / Help postgres , machines	2	234	May 22, 2024

Postgres cluster broken since last Fly migration

Related topics