Postgres database primary region node (failed to connect) pg check failing

How my problem was resolved.

The trick was to look at the volumes (which I did):

> fly volumes list -a my-postgres-cluster

```table
ID          STATE   NAM SIZE   REGION  ZONE    ENCRYPTED       ATTACHED VM     CREATED AT   
vol_****    created *** xGB    mia     **    true           *****        2 months ago
vol_****    created *** xGB    dfw     **    true            *****        3 days ago  
vol_****    created *** xGB    ewr     **    true            *****        1 week ago  
vol_****    created *** xGB    mia     **    true                   2 months ago
vol_****    created *** xGB    lax     **    true            *****        2 months ago

and to realize that one of the volumes was not attached to a VM.

Even though I saw that a volume was unattached I didn’t think to simply allow an instance to exist for that volume. It seems obvious now, right?

So to fix it, I had to change my scale. Originally, I was using this:

fly scale count 4 --max-per-region=1 -a my-postgres-cluster

and I switched it to this:

fly scale count 5 --max-per-region=2 -a my-postgres-cluster

This allowed the duplicate-region volume to be accounted for. A new cluster mia instance was created alongside the existing mia instance. The cluster could then heal itself.

I had to email fly support to get this understanding. Lessons learned the hard way.

PS the reason I switched RAM rapidly was because I misread a graph on Grafana. I thought RAM was full but actually used RAM was close to 0 and the graph was showing total RAM. Another lesson learned, the very hard way.

1 Like