postgres-db suspended

My postgres db changed the status to suspended but I don’t know why.

Waiting for logs...

2023-01-09T19:33:44.739 app[3287111b656985] gru [info] sentinel | 2023-01-09T19:33:44.738Z WARN cmd/sentinel.go:276 no keeper info available {"db": "cee06ded", "keeper": "1f63b8f3709e2"}

2023-01-09T22:40:55.777 app[3287111b656985] gru [info] sentinel | 2023-01-09T22:40:55.769Z WARN cmd/sentinel.go:276 no keeper info available {"db": "cee06ded", "keeper": "1f63b8f3709e2"}

2023-01-09T22:50:25.450 app[3287111b656985] gru [info] sentinel | 2023-01-09T22:50:25.443Z WARN cmd/sentinel.go:276 no keeper info available {"db": "cee06ded", "keeper": "1f63b8f3709e2"}

2023-01-09T22:50:44.450 app[3287111b656985] gru [info] sentinel | 2023-01-09T22:50:44.450Z WARN cmd/sentinel.go:276 no keeper info available {"db": "cee06ded", "keeper": "1f63b8f3709e2"}

2023-01-09T22:51:18.970 app[3287111b656985] gru [info] sentinel | 2023-01-09T22:51:18.969Z ERROR cmd/sentinel.go:1895 cannot get keepers info {"error": "Unexpected response code: 500"}

2023-01-09T22:53:03.257 app[3287111b656985] gru [info] keeper | 2023-01-09T22:53:03.249Z ERROR cmd/keeper.go:1041 error retrieving cluster data {"error": "Unexpected response code: 500"}

2023-01-10T13:47:00.303 app[3287111b656985] gru [info] sentinel | 2023-01-10T13:47:00.270Z ERROR cmd/sentinel.go:1852 error retrieving cluster data {"error": "Unexpected response code: 502"}

How can I make it up again?

I’ve tried restart with fly pg restart -a app-name but it returns Error no active leader found

4 Likes

I am observing the same.

same here

Go ahead and list your machines using:

fly machines list --app <app-name>

If any of your machines are in a stopped state, you can start it by running:

fly machines start <machine-id> --app <app-name>
6 Likes
fly machines start <machine-id> --app <app-name>

It worked like a charm!
Thank you very much!

2 Likes

@shaun I was in the same situation and your command did help to solve the situation.

However, it is unclear why my Postgres instance went from running to suspended. How can I prevent this to happen ? Is there a way to understand the reason of the db failure ?

2023-01-16T08:25:51Z app[6e82932add0787] fra [warn]Virtual machine exited abruptly
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]cluster spec filename /fly/cluster-spec.json
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]system memory: 256mb vcpu count: 1
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]{
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]    "initMode": "existing",
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]    "existingConfig": {
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]        "keeperUID": "c07e5aa2a1692"
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]    },
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]    "pgParameters": {
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]        "archive_command": "if [ $ENABLE_WALG ]; then /usr/local/bin/wal-g wal-push \"%p\"; fi",
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]        "archive_mode": "on",
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]        "archive_timeout": "60",
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]        "effective_cache_size": "192MB",
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]        "effective_io_concurrency": "200",
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]        "maintenance_work_mem": "64MB",
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]        "max_connections": "300",
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]        "max_parallel_workers": "8",
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]        "max_parallel_workers_per_gather": "2",
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]        "max_worker_processes": "8",
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]        "random_page_cost": "1.1",
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]        "shared_buffers": "64MB",
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]        "wal_compression": "on",
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]        "work_mem": "4MB"
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]    },
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]    "maxStandbysPerSender": 50,
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]    "deadKeeperRemovalInterval": "1h"
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]}
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]generated new config
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]keeper   | Running...
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]sentinel | Running...
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]proxy    | Running...
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]exporter | Running...
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]exporter | INFO[0000] Starting Server: :9187                        source="postgres_exporter.go:1837"
2023-01-16T08:25:55Z app[6e82932add0787] fra [info]proxy    | [WARNING] 015/082555 (538) : parsing [/fly/haproxy.cfg:38]: Missing LF on last line, file might have been truncated at position 96. This will become a hard error in HAProxy 2.3.
2023-01-16T08:25:55Z app[6e82932add0787] fra [info]exporter | INFO[0000] Established new database connection to "fdaa:0:7b2e:a7b:c07e:5aa2:a169:2:5433".  source="postgres_exporter.go:970"
2023-01-16T08:25:55Z app[6e82932add0787] fra [info]proxy    | [NOTICE] 015/082555 (538) : New worker #1 (563) forked
2023-01-16T08:25:55Z app[6e82932add0787] fra [info]proxy    | [WARNING] 015/082555 (563) : bk_db/pg1 changed its IP from (none) to fdaa:0:7b2e:a7b:c07e:5aa2:a169:2 by flydns/dns1.
2023-01-16T08:25:55Z app[6e82932add0787] fra [info]proxy    | [WARNING] 015/082555 (563) : Server bk_db/pg1 ('fra.monito-staging-db.internal') is UP/READY (resolves again).
2023-01-16T08:25:55Z app[6e82932add0787] fra [info]proxy    | [WARNING] 015/082555 (563) : Server bk_db/pg1 administratively READY thanks to valid DNS answer.
2023-01-16T08:25:55Z app[6e82932add0787] fra [info]keeper   | 2023-01-16T08:25:55.500Z  ERROR   cmd/keeper.go:719       cannot get configured pg parameters    {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2023-01-16T08:25:55Z app[6e82932add0787] fra [info]checking stolon status
2023-01-16T08:25:56Z app[6e82932add0787] fra [info]keeper is healthy, db is healthy, role: master
2023-01-16T08:25:56Z app[6e82932add0787] fra [info]exporter | ERRO[0001] Error opening connection to database (postgresql://flypgadmin:PASSWORD_REMOVED@[fdaa:0:7b2e:a7b:c07e:5aa2:a169:2]:5433/postgres?sslmode=disable): dial tcp [fdaa:0:7b2e:a7b:c07e:5aa2:a169:2]:5433: connect: connection refused  source="postgres_exporter.go:1658"
2023-01-16T08:25:56Z app[6e82932add0787] fra [info]exporter | INFO[0001] Established new database connection to "fdaa:0:7b2e:a7b:c07e:5aa2:a169:2:5433".  source="postgres_exporter.go:970"
2023-01-16T08:25:57Z app[6e82932add0787] fra [info]exporter | ERRO[0002] Error opening connection to database (postgresql://flypgadmin:PASSWORD_REMOVED@[fdaa:0:7b2e:a7b:c07e:5aa2:a169:2]:5433/postgres?sslmode=disable): dial tcp [fdaa:0:7b2e:a7b:c07e:5aa2:a169:2]:5433: connect: connection refused  source="postgres_exporter.go:1658"
2023-01-16T08:25:58Z app[6e82932add0787] fra [info]keeper   | 2023-01-16T08:25:58.003Z  ERROR   cmd/keeper.go:719       cannot get configured pg parameters    {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2023-01-16T08:26:00Z app[6e82932add0787] fra [info]keeper   | 2023-01-16T08:26:00.503Z  ERROR   cmd/keeper.go:719       cannot get configured pg parameters    {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2023-01-16T08:26:01Z app[6e82932add0787] fra [info]error connecting to local postgres context deadline exceeded
2023-01-16T08:26:01Z app[6e82932add0787] fra [info]checking stolon status
2023-01-16T08:26:01Z app[6e82932add0787] fra [info]keeper is healthy, db is healthy, role: master
2023-01-16T08:26:01Z app[6e82932add0787] fra [info]proxy    | [WARNING] 015/082601 (563) : Backup Server bk_db/pg is DOWN, reason: Layer7 timeout, check duration: 5000ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2023-01-16T08:26:01Z app[6e82932add0787] fra [info]sentinel | 2023-01-16T08:26:01.882Z  WARN    cmd/sentinel.go:276     no keeper info available       {"db": "23360d86", "keeper": "c07e5aa2a1692"}
2023-01-16T08:26:02Z app[6e82932add0787] fra [info]proxy    | [WARNING] 015/082602 (563) : Server bk_db/pg1 is DOWN, reason: Layer7 timeout, check duration: 5000ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2023-01-16T08:26:02Z app[6e82932add0787] fra [info]proxy    | [NOTICE] 015/082602 (563) : haproxy version is 2.2.9-2+deb11u3
2023-01-16T08:26:02Z app[6e82932add0787] fra [info]proxy    | [NOTICE] 015/082602 (563) : path to executable is /usr/sbin/haproxy
2023-01-16T08:26:02Z app[6e82932add0787] fra [info]proxy    | [ALERT] 015/082602 (563) : backend 'bk_db' has no server available!
2023-01-16T08:26:03Z app[6e82932add0787] fra [info]keeper   | 2023-01-16T08:26:03.004Z  ERROR   cmd/keeper.go:719       cannot get configured pg parameters    {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2023-01-16T08:26:05Z app[6e82932add0787] fra [info]keeper   | 2023-01-16T08:26:05.505Z  ERROR   cmd/keeper.go:719       cannot get configured pg parameters    {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2023-01-16T08:26:06Z app[6e82932add0787] fra [info]error connecting to local postgres context deadline exceeded
2023-01-16T08:26:06Z app[6e82932add0787] fra [info]checking stolon status
2023-01-16T08:26:06Z app[6e82932add0787] fra [info]keeper is healthy, db is healthy, role: master
2023-01-16T08:26:07Z app[6e82932add0787] fra [info]exporter | INFO[0012] Established new database connection to "fdaa:0:7b2e:a7b:c07e:5aa2:a169:2:5433".  source="postgres_exporter.go:970"
2023-01-16T08:26:08Z app[6e82932add0787] fra [info]keeper   | 2023-01-16T08:26:08.006Z  ERROR   cmd/keeper.go:719       cannot get configured pg parameters    {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2023-01-16T08:26:08Z app[6e82932add0787] fra [info]exporter | ERRO[0013] Error opening connection to database (postgresql://flypgadmin:PASSWORD_REMOVED@[fdaa:0:7b2e:a7b:c07e:5aa2:a169:2]:5433/postgres?sslmode=disable): dial tcp [fdaa:0:7b2e:a7b:c07e:5aa2:a169:2]:5433: connect: connection refused  source="postgres_exporter.go:1658"
2023-01-16T08:26:09Z app[6e82932add0787] fra [info]exporter | INFO[0014] Established new database connection to "fdaa:0:7b2e:a7b:c07e:5aa2:a169:2:5433".  source="postgres_exporter.go:970"
2023-01-16T08:26:10Z app[6e82932add0787] fra [info]exporter | ERRO[0015] Error opening connection to database (postgresql://flypgadmin:PASSWORD_REMOVED@[fdaa:0:7b2e:a7b:c07e:5aa2:a169:2]:5433/postgres?sslmode=disable): dial tcp [fdaa:0:7b2e:a7b:c07e:5aa2:a169:2]:5433: connect: connection refused  source="postgres_exporter.go:1658"
2023-01-16T08:26:10Z app[6e82932add0787] fra [info]keeper   | 2023-01-16T08:26:10.508Z  ERROR   cmd/keeper.go:719       cannot get configured pg parameters    {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2023-01-16T08:26:10Z app[6e82932add0787] fra [info]keeper   | 2023-01-16 08:26:10.863 UTC [598] LOG:  starting PostgreSQL 14.4 (Debian 14.4-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2023-01-16T08:26:10Z app[6e82932add0787] fra [info]keeper   | 2023-01-16 08:26:10.863 UTC [598] LOG:  listening on IPv6 address "fdaa:0:7b2e:a7b:c07e:5aa2:a169:2", port 5433
2023-01-16T08:26:10Z app[6e82932add0787] fra [info]keeper   | 2023-01-16 08:26:10.870 UTC [598] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5433"
2023-01-16T08:26:10Z app[6e82932add0787] fra [info]keeper   | 2023-01-16 08:26:10.874 UTC [599] LOG:  database system was interrupted; last known up at 2023-01-11 15:27:54 UTC
2023-01-16T08:26:10Z app[6e82932add0787] fra [info]keeper   | 2023-01-16 08:26:10.938 UTC [599] LOG:  database system was not properly shut down; automatic recovery in progress
2023-01-16T08:26:10Z app[6e82932add0787] fra [info]keeper   | 2023-01-16 08:26:10.941 UTC [599] LOG:  redo starts at 0/31000028
2023-01-16T08:26:10Z app[6e82932add0787] fra [info]keeper   | 2023-01-16 08:26:10.942 UTC [599] LOG:  redo done at 0/31000110 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2023-01-16T08:26:10Z app[6e82932add0787] fra [info]keeper   | 2023-01-16 08:26:10.951 UTC [598] LOG:  database system is ready to accept connections
2023-01-16T08:26:11Z app[6e82932add0787] fra [info]error connecting to local postgres context deadline exceeded
2023-01-16T08:26:11Z app[6e82932add0787] fra [info]checking stolon status
2023-01-16T08:26:12Z app[6e82932add0787] fra [info]keeper is healthy, db is healthy, role: master
2023-01-16T08:26:12Z app[6e82932add0787] fra [info]configuring operator
2023-01-16T08:26:12Z app[6e82932add0787] fra [info]operator password does not match config, changing
2023-01-16T08:26:12Z app[6e82932add0787] fra [info]operator ready!
2023-01-16T08:26:12Z app[6e82932add0787] fra [info]configuring repluser
2023-01-16T08:26:12Z app[6e82932add0787] fra [info]repluser password does not match config, changing
2023-01-16T08:26:12Z app[6e82932add0787] fra [info]replication ready!

The exact same thing happened to our Postgres staging app. This is really lame, because no reason was specified in the logs and no notification was issued. I found out from CI tasks failing.

I’m having the same problem. I’ll try to restart it.
Been happening since 10 June

Mine is also suspended suddenly, bringing the machine up doesn’t work either, I just get

Error: could not start machine <id>: failed to start VM <id>: aborted: machine exited abruptly

Looks like it started anyway now, but I’m still clueless as to why it stopped.

New Postgres gives you the option to scale it to zero when you launch: Scale to zero for Postgres Development projects · Fly Docs

The earlier posts in this thread look like Postgres crashed, most likely due to memory issues.

I had the same issue today, I tried restart the app and the machine, however, only after upgrading flyctl I got the PG restarted successfully.