Postgres app connection issue with website app

Hm… If the older Machine retains its replica status, then I’d suggest a verification step next:

$ fly m start 178103db9120d8 -a db-app-name  # ensure running.
$ fly ssh console --machine 178103db9120d8 -a db-app-name
# su postgres
% psql -p 5433  # note that this is 5433, not the usual 5432.
postgres=#

Then poke around a little, like before, to examine the tables…

(What this is doing is bypassing all the watchdogs, etc., allowing it to succeed where fly pg connect would fail.)

If your data is there, then it’s mainly a matter of how inconvenient it would be to turn things back into a repaired and/or fresh PG cluster. I don’t know the PG Flex mechanisms in low-enough-level detail to advise from afar on how to perform surgery on one, but likely there is some scalpel slice with which you can inform repmgr that there no longer is a primary that it should be trying to contact. The following classic post by @uncvrd might have the clues that a sufficiently admin-minded person would need to piece that together, even though it’s not precisely the same situation:

https://community.fly.io/t/heres-how-to-fix-an-unreachable-2-zombie-1-replica-ha-postgres-cluster/19503

Other than such measures, there’s the inelegant but (typically) rather effective technique of fly pg create --fork-from:

https://community.fly.io/t/urgency-problems-with-postgres-the-database-is-not-responding/19926/2

(You will need the explicit volume ID here, since there is no primary.)

This one would require you to go through the DATABASE_URL or DB_URL_ENV dance again, and of course this doesn’t resolve the question of why the cluster failed in the first place, :thought_balloon:.

Hope this helps a little!