Unable to restart Fly Postgres cluster

2022-10-31T20:14:59Z app[148e374c306389] iad [info]sentinel | 2022-10-31T20:14:59.417Z	ERROR	cmd/sentinel.go:1852	error retrieving cluster data	{"error": "Unexpected response code: 502"}
2022-11-01T15:03:48Z app[4d89604a410287] iad [info]proxy    | [WARNING] 304/150348 (580) : Server bk_db/pg1 is going DOWN for maintenance (No IP for server ). 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2022-11-01T15:03:48Z app[4d89604a410287] iad [info]proxy    | [NOTICE] 304/150348 (580) : haproxy version is 2.2.9-2+deb11u3
2022-11-01T15:03:48Z app[4d89604a410287] iad [info]proxy    | [NOTICE] 304/150348 (580) : path to executable is /usr/sbin/haproxy
2022-11-01T15:03:48Z app[4d89604a410287] iad [info]proxy    | [ALERT] 304/150348 (580) : backend 'bk_db' has no server available!
2022-11-01T15:03:48Z app[148e377ad32789] iad [info]proxy    | [WARNING] 304/150348 (578) : Server bk_db/pg3 is going DOWN for maintenance (No IP for server ). 0 active and 1 backup servers left. Running on backup. 0 sessions active, 0 requeued, 0 remaining in queue.
2022-11-01T15:03:49Z app[148e374c306389] iad [info]proxy    | [WARNING] 304/150349 (577) : Server bk_db/pg1 is going DOWN for maintenance (No IP for server ). 0 active and 0 backup servers left. 33 sessions active, 0 requeued, 0 remaining in queue.
2022-11-01T15:03:49Z app[148e374c306389] iad [info]proxy    | [NOTICE] 304/150349 (577) : haproxy version is 2.2.9-2+deb11u3
2022-11-01T15:03:49Z app[148e374c306389] iad [info]proxy    | [NOTICE] 304/150349 (577) : path to executable is /usr/sbin/haproxy
2022-11-01T15:03:49Z app[148e374c306389] iad [info]proxy    | [ALERT] 304/150349 (577) : backend 'bk_db' has no server available!
2022-11-01T15:03:49Z app[4d89604a410287] iad [info]proxy    | [WARNING] 304/150349 (580) : Server bk_db/pg1 ('iad.karambit-ai-production-db.internal') is UP/READY (resolves again).
2022-11-01T15:03:52Z app[4d89604a410287] iad [info]proxy    | [WARNING] 304/150352 (580) : Server bk_db/pg1 is going DOWN for maintenance (No IP for server ). 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2022-11-01T15:03:52Z app[4d89604a410287] iad [info]proxy    | [ALERT] 304/150352 (580) : backend 'bk_db' has no server available!
2022-11-01T15:04:18Z app[148e377ad32789] iad [info]proxy    | [WARNING] 304/150418 (578) : Server bk_db/pg2 was DOWN and now enters maintenance (No IP for server ).
2022-11-01T15:04:18Z app[148e374c306389] iad [info]proxy    | [WARNING] 304/150418 (577) : Server bk_db/pg2 was DOWN and now enters maintenance (No IP for server ).
2022-11-01T15:04:24Z app[4d89604a410287] iad [info]proxy    | [WARNING] 304/150424 (580) : Server bk_db/pg3 was DOWN and now enters maintenance (No IP for server ).
2022-11-01T15:09:08Z app[148e374c306389] iad [info]proxy    | [WARNING] 304/150908 (577) : Server bk_db/pg3 was DOWN and now enters maintenance (unspecified DNS error).
2022-11-01T15:09:09Z app[4d89604a410287] iad [info]proxy    | [WARNING] 304/150909 (580) : Server bk_db/pg2 was DOWN and now enters maintenance (unspecified DNS error).
2022-11-01T15:09:10Z app[148e377ad32789] iad [info]proxy    | [WARNING] 304/150910 (578) : Server bk_db/pg1 was DOWN and now enters maintenance (unspecified DNS error).

I have a Postgres cluster that was recently created on machines.
It went into a suspended state and I’ve been unable to revive it.

I’ve tried (unsuccessfully):

  • fly apps restart
  • fly pg restart
  • updating the image (fly image update) but the image is latest

The last attempt I made was running fly machine update against each machine. The machine state moves to replacing with the following output. It then hangs trying to open the encrypted volume.

The following config has been updated
  api.MachineConfig{
  	Env:       {"PRIMARY_REGION": "iad"}
  	Init:      {}
  	Processes: nil
- 	Image:     "flyio/postgres:14.4"
+ 	Image:     "registry-1.docker.io/flyio/postgres:14.4"
  	Metadata:  {"managed-by-fly-deploy": "true"}