@shaun I was in the same situation and your command did help to solve the situation.
However, it is unclear why my Postgres instance went from running to suspended. How can I prevent this to happen ? Is there a way to understand the reason of the db failure ?
2023-01-16T08:25:51Z app[6e82932add0787] fra [warn]Virtual machine exited abruptly
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]cluster spec filename /fly/cluster-spec.json
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]system memory: 256mb vcpu count: 1
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]{
2023-01-16T08:25:54Z app[6e82932add0787] fra [info] "initMode": "existing",
2023-01-16T08:25:54Z app[6e82932add0787] fra [info] "existingConfig": {
2023-01-16T08:25:54Z app[6e82932add0787] fra [info] "keeperUID": "c07e5aa2a1692"
2023-01-16T08:25:54Z app[6e82932add0787] fra [info] },
2023-01-16T08:25:54Z app[6e82932add0787] fra [info] "pgParameters": {
2023-01-16T08:25:54Z app[6e82932add0787] fra [info] "archive_command": "if [ $ENABLE_WALG ]; then /usr/local/bin/wal-g wal-push \"%p\"; fi",
2023-01-16T08:25:54Z app[6e82932add0787] fra [info] "archive_mode": "on",
2023-01-16T08:25:54Z app[6e82932add0787] fra [info] "archive_timeout": "60",
2023-01-16T08:25:54Z app[6e82932add0787] fra [info] "effective_cache_size": "192MB",
2023-01-16T08:25:54Z app[6e82932add0787] fra [info] "effective_io_concurrency": "200",
2023-01-16T08:25:54Z app[6e82932add0787] fra [info] "maintenance_work_mem": "64MB",
2023-01-16T08:25:54Z app[6e82932add0787] fra [info] "max_connections": "300",
2023-01-16T08:25:54Z app[6e82932add0787] fra [info] "max_parallel_workers": "8",
2023-01-16T08:25:54Z app[6e82932add0787] fra [info] "max_parallel_workers_per_gather": "2",
2023-01-16T08:25:54Z app[6e82932add0787] fra [info] "max_worker_processes": "8",
2023-01-16T08:25:54Z app[6e82932add0787] fra [info] "random_page_cost": "1.1",
2023-01-16T08:25:54Z app[6e82932add0787] fra [info] "shared_buffers": "64MB",
2023-01-16T08:25:54Z app[6e82932add0787] fra [info] "wal_compression": "on",
2023-01-16T08:25:54Z app[6e82932add0787] fra [info] "work_mem": "4MB"
2023-01-16T08:25:54Z app[6e82932add0787] fra [info] },
2023-01-16T08:25:54Z app[6e82932add0787] fra [info] "maxStandbysPerSender": 50,
2023-01-16T08:25:54Z app[6e82932add0787] fra [info] "deadKeeperRemovalInterval": "1h"
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]}
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]generated new config
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]keeper | Running...
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]sentinel | Running...
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]proxy | Running...
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]exporter | Running...
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]exporter | INFO[0000] Starting Server: :9187 source="postgres_exporter.go:1837"
2023-01-16T08:25:55Z app[6e82932add0787] fra [info]proxy | [WARNING] 015/082555 (538) : parsing [/fly/haproxy.cfg:38]: Missing LF on last line, file might have been truncated at position 96. This will become a hard error in HAProxy 2.3.
2023-01-16T08:25:55Z app[6e82932add0787] fra [info]exporter | INFO[0000] Established new database connection to "fdaa:0:7b2e:a7b:c07e:5aa2:a169:2:5433". source="postgres_exporter.go:970"
2023-01-16T08:25:55Z app[6e82932add0787] fra [info]proxy | [NOTICE] 015/082555 (538) : New worker #1 (563) forked
2023-01-16T08:25:55Z app[6e82932add0787] fra [info]proxy | [WARNING] 015/082555 (563) : bk_db/pg1 changed its IP from (none) to fdaa:0:7b2e:a7b:c07e:5aa2:a169:2 by flydns/dns1.
2023-01-16T08:25:55Z app[6e82932add0787] fra [info]proxy | [WARNING] 015/082555 (563) : Server bk_db/pg1 ('fra.monito-staging-db.internal') is UP/READY (resolves again).
2023-01-16T08:25:55Z app[6e82932add0787] fra [info]proxy | [WARNING] 015/082555 (563) : Server bk_db/pg1 administratively READY thanks to valid DNS answer.
2023-01-16T08:25:55Z app[6e82932add0787] fra [info]keeper | 2023-01-16T08:25:55.500Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2023-01-16T08:25:55Z app[6e82932add0787] fra [info]checking stolon status
2023-01-16T08:25:56Z app[6e82932add0787] fra [info]keeper is healthy, db is healthy, role: master
2023-01-16T08:25:56Z app[6e82932add0787] fra [info]exporter | ERRO[0001] Error opening connection to database (postgresql://flypgadmin:PASSWORD_REMOVED@[fdaa:0:7b2e:a7b:c07e:5aa2:a169:2]:5433/postgres?sslmode=disable): dial tcp [fdaa:0:7b2e:a7b:c07e:5aa2:a169:2]:5433: connect: connection refused source="postgres_exporter.go:1658"
2023-01-16T08:25:56Z app[6e82932add0787] fra [info]exporter | INFO[0001] Established new database connection to "fdaa:0:7b2e:a7b:c07e:5aa2:a169:2:5433". source="postgres_exporter.go:970"
2023-01-16T08:25:57Z app[6e82932add0787] fra [info]exporter | ERRO[0002] Error opening connection to database (postgresql://flypgadmin:PASSWORD_REMOVED@[fdaa:0:7b2e:a7b:c07e:5aa2:a169:2]:5433/postgres?sslmode=disable): dial tcp [fdaa:0:7b2e:a7b:c07e:5aa2:a169:2]:5433: connect: connection refused source="postgres_exporter.go:1658"
2023-01-16T08:25:58Z app[6e82932add0787] fra [info]keeper | 2023-01-16T08:25:58.003Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2023-01-16T08:26:00Z app[6e82932add0787] fra [info]keeper | 2023-01-16T08:26:00.503Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2023-01-16T08:26:01Z app[6e82932add0787] fra [info]error connecting to local postgres context deadline exceeded
2023-01-16T08:26:01Z app[6e82932add0787] fra [info]checking stolon status
2023-01-16T08:26:01Z app[6e82932add0787] fra [info]keeper is healthy, db is healthy, role: master
2023-01-16T08:26:01Z app[6e82932add0787] fra [info]proxy | [WARNING] 015/082601 (563) : Backup Server bk_db/pg is DOWN, reason: Layer7 timeout, check duration: 5000ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2023-01-16T08:26:01Z app[6e82932add0787] fra [info]sentinel | 2023-01-16T08:26:01.882Z WARN cmd/sentinel.go:276 no keeper info available {"db": "23360d86", "keeper": "c07e5aa2a1692"}
2023-01-16T08:26:02Z app[6e82932add0787] fra [info]proxy | [WARNING] 015/082602 (563) : Server bk_db/pg1 is DOWN, reason: Layer7 timeout, check duration: 5000ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2023-01-16T08:26:02Z app[6e82932add0787] fra [info]proxy | [NOTICE] 015/082602 (563) : haproxy version is 2.2.9-2+deb11u3
2023-01-16T08:26:02Z app[6e82932add0787] fra [info]proxy | [NOTICE] 015/082602 (563) : path to executable is /usr/sbin/haproxy
2023-01-16T08:26:02Z app[6e82932add0787] fra [info]proxy | [ALERT] 015/082602 (563) : backend 'bk_db' has no server available!
2023-01-16T08:26:03Z app[6e82932add0787] fra [info]keeper | 2023-01-16T08:26:03.004Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2023-01-16T08:26:05Z app[6e82932add0787] fra [info]keeper | 2023-01-16T08:26:05.505Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2023-01-16T08:26:06Z app[6e82932add0787] fra [info]error connecting to local postgres context deadline exceeded
2023-01-16T08:26:06Z app[6e82932add0787] fra [info]checking stolon status
2023-01-16T08:26:06Z app[6e82932add0787] fra [info]keeper is healthy, db is healthy, role: master
2023-01-16T08:26:07Z app[6e82932add0787] fra [info]exporter | INFO[0012] Established new database connection to "fdaa:0:7b2e:a7b:c07e:5aa2:a169:2:5433". source="postgres_exporter.go:970"
2023-01-16T08:26:08Z app[6e82932add0787] fra [info]keeper | 2023-01-16T08:26:08.006Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2023-01-16T08:26:08Z app[6e82932add0787] fra [info]exporter | ERRO[0013] Error opening connection to database (postgresql://flypgadmin:PASSWORD_REMOVED@[fdaa:0:7b2e:a7b:c07e:5aa2:a169:2]:5433/postgres?sslmode=disable): dial tcp [fdaa:0:7b2e:a7b:c07e:5aa2:a169:2]:5433: connect: connection refused source="postgres_exporter.go:1658"
2023-01-16T08:26:09Z app[6e82932add0787] fra [info]exporter | INFO[0014] Established new database connection to "fdaa:0:7b2e:a7b:c07e:5aa2:a169:2:5433". source="postgres_exporter.go:970"
2023-01-16T08:26:10Z app[6e82932add0787] fra [info]exporter | ERRO[0015] Error opening connection to database (postgresql://flypgadmin:PASSWORD_REMOVED@[fdaa:0:7b2e:a7b:c07e:5aa2:a169:2]:5433/postgres?sslmode=disable): dial tcp [fdaa:0:7b2e:a7b:c07e:5aa2:a169:2]:5433: connect: connection refused source="postgres_exporter.go:1658"
2023-01-16T08:26:10Z app[6e82932add0787] fra [info]keeper | 2023-01-16T08:26:10.508Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2023-01-16T08:26:10Z app[6e82932add0787] fra [info]keeper | 2023-01-16 08:26:10.863 UTC [598] LOG: starting PostgreSQL 14.4 (Debian 14.4-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2023-01-16T08:26:10Z app[6e82932add0787] fra [info]keeper | 2023-01-16 08:26:10.863 UTC [598] LOG: listening on IPv6 address "fdaa:0:7b2e:a7b:c07e:5aa2:a169:2", port 5433
2023-01-16T08:26:10Z app[6e82932add0787] fra [info]keeper | 2023-01-16 08:26:10.870 UTC [598] LOG: listening on Unix socket "/tmp/.s.PGSQL.5433"
2023-01-16T08:26:10Z app[6e82932add0787] fra [info]keeper | 2023-01-16 08:26:10.874 UTC [599] LOG: database system was interrupted; last known up at 2023-01-11 15:27:54 UTC
2023-01-16T08:26:10Z app[6e82932add0787] fra [info]keeper | 2023-01-16 08:26:10.938 UTC [599] LOG: database system was not properly shut down; automatic recovery in progress
2023-01-16T08:26:10Z app[6e82932add0787] fra [info]keeper | 2023-01-16 08:26:10.941 UTC [599] LOG: redo starts at 0/31000028
2023-01-16T08:26:10Z app[6e82932add0787] fra [info]keeper | 2023-01-16 08:26:10.942 UTC [599] LOG: redo done at 0/31000110 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2023-01-16T08:26:10Z app[6e82932add0787] fra [info]keeper | 2023-01-16 08:26:10.951 UTC [598] LOG: database system is ready to accept connections
2023-01-16T08:26:11Z app[6e82932add0787] fra [info]error connecting to local postgres context deadline exceeded
2023-01-16T08:26:11Z app[6e82932add0787] fra [info]checking stolon status
2023-01-16T08:26:12Z app[6e82932add0787] fra [info]keeper is healthy, db is healthy, role: master
2023-01-16T08:26:12Z app[6e82932add0787] fra [info]configuring operator
2023-01-16T08:26:12Z app[6e82932add0787] fra [info]operator password does not match config, changing
2023-01-16T08:26:12Z app[6e82932add0787] fra [info]operator ready!
2023-01-16T08:26:12Z app[6e82932add0787] fra [info]configuring repluser
2023-01-16T08:26:12Z app[6e82932add0787] fra [info]repluser password does not match config, changing
2023-01-16T08:26:12Z app[6e82932add0787] fra [info]replication ready!