postgres-db suspended

My postgres db changed the status to suspended but I don’t know why.

Waiting for logs...

2023-01-09T19:33:44.739 app[3287111b656985] gru [info] sentinel | 2023-01-09T19:33:44.738Z WARN cmd/sentinel.go:276 no keeper info available {"db": "cee06ded", "keeper": "1f63b8f3709e2"}

2023-01-09T22:40:55.777 app[3287111b656985] gru [info] sentinel | 2023-01-09T22:40:55.769Z WARN cmd/sentinel.go:276 no keeper info available {"db": "cee06ded", "keeper": "1f63b8f3709e2"}

2023-01-09T22:50:25.450 app[3287111b656985] gru [info] sentinel | 2023-01-09T22:50:25.443Z WARN cmd/sentinel.go:276 no keeper info available {"db": "cee06ded", "keeper": "1f63b8f3709e2"}

2023-01-09T22:50:44.450 app[3287111b656985] gru [info] sentinel | 2023-01-09T22:50:44.450Z WARN cmd/sentinel.go:276 no keeper info available {"db": "cee06ded", "keeper": "1f63b8f3709e2"}

2023-01-09T22:51:18.970 app[3287111b656985] gru [info] sentinel | 2023-01-09T22:51:18.969Z ERROR cmd/sentinel.go:1895 cannot get keepers info {"error": "Unexpected response code: 500"}

2023-01-09T22:53:03.257 app[3287111b656985] gru [info] keeper | 2023-01-09T22:53:03.249Z ERROR cmd/keeper.go:1041 error retrieving cluster data {"error": "Unexpected response code: 500"}

2023-01-10T13:47:00.303 app[3287111b656985] gru [info] sentinel | 2023-01-10T13:47:00.270Z ERROR cmd/sentinel.go:1852 error retrieving cluster data {"error": "Unexpected response code: 502"}

How can I make it up again?

I’ve tried restart with fly pg restart -a app-name but it returns Error no active leader found

3 Likes

I am observing the same.

same here

Go ahead and list your machines using:

fly machines list --app <app-name>

If any of your machines are in a stopped state, you can start it by running:

fly machines start <machine-id> --app <app-name>
3 Likes
fly machines start <machine-id> --app <app-name>

It worked like a charm!
Thank you very much!

1 Like

@shaun I was in the same situation and your command did help to solve the situation.

However, it is unclear why my Postgres instance went from running to suspended. How can I prevent this to happen ? Is there a way to understand the reason of the db failure ?

2023-01-16T08:25:51Z app[6e82932add0787] fra [warn]Virtual machine exited abruptly
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]cluster spec filename /fly/cluster-spec.json
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]system memory: 256mb vcpu count: 1
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]{
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]    "initMode": "existing",
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]    "existingConfig": {
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]        "keeperUID": "c07e5aa2a1692"
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]    },
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]    "pgParameters": {
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]        "archive_command": "if [ $ENABLE_WALG ]; then /usr/local/bin/wal-g wal-push \"%p\"; fi",
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]        "archive_mode": "on",
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]        "archive_timeout": "60",
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]        "effective_cache_size": "192MB",
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]        "effective_io_concurrency": "200",
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]        "maintenance_work_mem": "64MB",
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]        "max_connections": "300",
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]        "max_parallel_workers": "8",
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]        "max_parallel_workers_per_gather": "2",
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]        "max_worker_processes": "8",
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]        "random_page_cost": "1.1",
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]        "shared_buffers": "64MB",
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]        "wal_compression": "on",
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]        "work_mem": "4MB"
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]    },
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]    "maxStandbysPerSender": 50,
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]    "deadKeeperRemovalInterval": "1h"
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]}
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]generated new config
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]keeper   | Running...
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]sentinel | Running...
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]proxy    | Running...
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]exporter | Running...
2023-01-16T08:25:54Z app[6e82932add0787] fra [info]exporter | INFO[0000] Starting Server: :9187                        source="postgres_exporter.go:1837"
2023-01-16T08:25:55Z app[6e82932add0787] fra [info]proxy    | [WARNING] 015/082555 (538) : parsing [/fly/haproxy.cfg:38]: Missing LF on last line, file might have been truncated at position 96. This will become a hard error in HAProxy 2.3.
2023-01-16T08:25:55Z app[6e82932add0787] fra [info]exporter | INFO[0000] Established new database connection to "fdaa:0:7b2e:a7b:c07e:5aa2:a169:2:5433".  source="postgres_exporter.go:970"
2023-01-16T08:25:55Z app[6e82932add0787] fra [info]proxy    | [NOTICE] 015/082555 (538) : New worker #1 (563) forked
2023-01-16T08:25:55Z app[6e82932add0787] fra [info]proxy    | [WARNING] 015/082555 (563) : bk_db/pg1 changed its IP from (none) to fdaa:0:7b2e:a7b:c07e:5aa2:a169:2 by flydns/dns1.
2023-01-16T08:25:55Z app[6e82932add0787] fra [info]proxy    | [WARNING] 015/082555 (563) : Server bk_db/pg1 ('fra.monito-staging-db.internal') is UP/READY (resolves again).
2023-01-16T08:25:55Z app[6e82932add0787] fra [info]proxy    | [WARNING] 015/082555 (563) : Server bk_db/pg1 administratively READY thanks to valid DNS answer.
2023-01-16T08:25:55Z app[6e82932add0787] fra [info]keeper   | 2023-01-16T08:25:55.500Z  ERROR   cmd/keeper.go:719       cannot get configured pg parameters    {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2023-01-16T08:25:55Z app[6e82932add0787] fra [info]checking stolon status
2023-01-16T08:25:56Z app[6e82932add0787] fra [info]keeper is healthy, db is healthy, role: master
2023-01-16T08:25:56Z app[6e82932add0787] fra [info]exporter | ERRO[0001] Error opening connection to database (postgresql://flypgadmin:PASSWORD_REMOVED@[fdaa:0:7b2e:a7b:c07e:5aa2:a169:2]:5433/postgres?sslmode=disable): dial tcp [fdaa:0:7b2e:a7b:c07e:5aa2:a169:2]:5433: connect: connection refused  source="postgres_exporter.go:1658"
2023-01-16T08:25:56Z app[6e82932add0787] fra [info]exporter | INFO[0001] Established new database connection to "fdaa:0:7b2e:a7b:c07e:5aa2:a169:2:5433".  source="postgres_exporter.go:970"
2023-01-16T08:25:57Z app[6e82932add0787] fra [info]exporter | ERRO[0002] Error opening connection to database (postgresql://flypgadmin:PASSWORD_REMOVED@[fdaa:0:7b2e:a7b:c07e:5aa2:a169:2]:5433/postgres?sslmode=disable): dial tcp [fdaa:0:7b2e:a7b:c07e:5aa2:a169:2]:5433: connect: connection refused  source="postgres_exporter.go:1658"
2023-01-16T08:25:58Z app[6e82932add0787] fra [info]keeper   | 2023-01-16T08:25:58.003Z  ERROR   cmd/keeper.go:719       cannot get configured pg parameters    {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2023-01-16T08:26:00Z app[6e82932add0787] fra [info]keeper   | 2023-01-16T08:26:00.503Z  ERROR   cmd/keeper.go:719       cannot get configured pg parameters    {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2023-01-16T08:26:01Z app[6e82932add0787] fra [info]error connecting to local postgres context deadline exceeded
2023-01-16T08:26:01Z app[6e82932add0787] fra [info]checking stolon status
2023-01-16T08:26:01Z app[6e82932add0787] fra [info]keeper is healthy, db is healthy, role: master
2023-01-16T08:26:01Z app[6e82932add0787] fra [info]proxy    | [WARNING] 015/082601 (563) : Backup Server bk_db/pg is DOWN, reason: Layer7 timeout, check duration: 5000ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2023-01-16T08:26:01Z app[6e82932add0787] fra [info]sentinel | 2023-01-16T08:26:01.882Z  WARN    cmd/sentinel.go:276     no keeper info available       {"db": "23360d86", "keeper": "c07e5aa2a1692"}
2023-01-16T08:26:02Z app[6e82932add0787] fra [info]proxy    | [WARNING] 015/082602 (563) : Server bk_db/pg1 is DOWN, reason: Layer7 timeout, check duration: 5000ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2023-01-16T08:26:02Z app[6e82932add0787] fra [info]proxy    | [NOTICE] 015/082602 (563) : haproxy version is 2.2.9-2+deb11u3
2023-01-16T08:26:02Z app[6e82932add0787] fra [info]proxy    | [NOTICE] 015/082602 (563) : path to executable is /usr/sbin/haproxy
2023-01-16T08:26:02Z app[6e82932add0787] fra [info]proxy    | [ALERT] 015/082602 (563) : backend 'bk_db' has no server available!
2023-01-16T08:26:03Z app[6e82932add0787] fra [info]keeper   | 2023-01-16T08:26:03.004Z  ERROR   cmd/keeper.go:719       cannot get configured pg parameters    {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2023-01-16T08:26:05Z app[6e82932add0787] fra [info]keeper   | 2023-01-16T08:26:05.505Z  ERROR   cmd/keeper.go:719       cannot get configured pg parameters    {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2023-01-16T08:26:06Z app[6e82932add0787] fra [info]error connecting to local postgres context deadline exceeded
2023-01-16T08:26:06Z app[6e82932add0787] fra [info]checking stolon status
2023-01-16T08:26:06Z app[6e82932add0787] fra [info]keeper is healthy, db is healthy, role: master
2023-01-16T08:26:07Z app[6e82932add0787] fra [info]exporter | INFO[0012] Established new database connection to "fdaa:0:7b2e:a7b:c07e:5aa2:a169:2:5433".  source="postgres_exporter.go:970"
2023-01-16T08:26:08Z app[6e82932add0787] fra [info]keeper   | 2023-01-16T08:26:08.006Z  ERROR   cmd/keeper.go:719       cannot get configured pg parameters    {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2023-01-16T08:26:08Z app[6e82932add0787] fra [info]exporter | ERRO[0013] Error opening connection to database (postgresql://flypgadmin:PASSWORD_REMOVED@[fdaa:0:7b2e:a7b:c07e:5aa2:a169:2]:5433/postgres?sslmode=disable): dial tcp [fdaa:0:7b2e:a7b:c07e:5aa2:a169:2]:5433: connect: connection refused  source="postgres_exporter.go:1658"
2023-01-16T08:26:09Z app[6e82932add0787] fra [info]exporter | INFO[0014] Established new database connection to "fdaa:0:7b2e:a7b:c07e:5aa2:a169:2:5433".  source="postgres_exporter.go:970"
2023-01-16T08:26:10Z app[6e82932add0787] fra [info]exporter | ERRO[0015] Error opening connection to database (postgresql://flypgadmin:PASSWORD_REMOVED@[fdaa:0:7b2e:a7b:c07e:5aa2:a169:2]:5433/postgres?sslmode=disable): dial tcp [fdaa:0:7b2e:a7b:c07e:5aa2:a169:2]:5433: connect: connection refused  source="postgres_exporter.go:1658"
2023-01-16T08:26:10Z app[6e82932add0787] fra [info]keeper   | 2023-01-16T08:26:10.508Z  ERROR   cmd/keeper.go:719       cannot get configured pg parameters    {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2023-01-16T08:26:10Z app[6e82932add0787] fra [info]keeper   | 2023-01-16 08:26:10.863 UTC [598] LOG:  starting PostgreSQL 14.4 (Debian 14.4-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2023-01-16T08:26:10Z app[6e82932add0787] fra [info]keeper   | 2023-01-16 08:26:10.863 UTC [598] LOG:  listening on IPv6 address "fdaa:0:7b2e:a7b:c07e:5aa2:a169:2", port 5433
2023-01-16T08:26:10Z app[6e82932add0787] fra [info]keeper   | 2023-01-16 08:26:10.870 UTC [598] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5433"
2023-01-16T08:26:10Z app[6e82932add0787] fra [info]keeper   | 2023-01-16 08:26:10.874 UTC [599] LOG:  database system was interrupted; last known up at 2023-01-11 15:27:54 UTC
2023-01-16T08:26:10Z app[6e82932add0787] fra [info]keeper   | 2023-01-16 08:26:10.938 UTC [599] LOG:  database system was not properly shut down; automatic recovery in progress
2023-01-16T08:26:10Z app[6e82932add0787] fra [info]keeper   | 2023-01-16 08:26:10.941 UTC [599] LOG:  redo starts at 0/31000028
2023-01-16T08:26:10Z app[6e82932add0787] fra [info]keeper   | 2023-01-16 08:26:10.942 UTC [599] LOG:  redo done at 0/31000110 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2023-01-16T08:26:10Z app[6e82932add0787] fra [info]keeper   | 2023-01-16 08:26:10.951 UTC [598] LOG:  database system is ready to accept connections
2023-01-16T08:26:11Z app[6e82932add0787] fra [info]error connecting to local postgres context deadline exceeded
2023-01-16T08:26:11Z app[6e82932add0787] fra [info]checking stolon status
2023-01-16T08:26:12Z app[6e82932add0787] fra [info]keeper is healthy, db is healthy, role: master
2023-01-16T08:26:12Z app[6e82932add0787] fra [info]configuring operator
2023-01-16T08:26:12Z app[6e82932add0787] fra [info]operator password does not match config, changing
2023-01-16T08:26:12Z app[6e82932add0787] fra [info]operator ready!
2023-01-16T08:26:12Z app[6e82932add0787] fra [info]configuring repluser
2023-01-16T08:26:12Z app[6e82932add0787] fra [info]repluser password does not match config, changing
2023-01-16T08:26:12Z app[6e82932add0787] fra [info]replication ready!