My postgres db just stopped working suddenly and I can’t restart it no matter what I do.
h1bjobs/db $ fly status
ID STATE ROLE REGION CHECKS IMAGE CREATED UPDATED
287444da0d7738 started error ewr 3 total, 3 critical flyio/postgres:14 (v0.0.41) 2024-01-16T22:51:24Z 2024-01-21T00:37:36Z
restarting does not work
h1bjobs/db $ flyctl machine restart 287444da0d7738
Restarting machine 287444da0d7738
Waiting for 287444da0d7738 to become healthy (started, 1/3)
Error: failed to restart machine 287444da0d7738: failed to wait for health checks to pass: context deadline exceeded
h1bjobs/db $ fly checks list -a h1bjobs-db
Health Checks for h1bjobs-db
NAME | STATUS | MACHINE | LAST UPDATED | OUTPUT
-------*----------*----------------*--------------*--------------------------------------------------------------------------
pg | critical | 287444da0d7738 | 2h0m ago | 500 Internal Server Error
| | | | failed to connect to proxy: context deadline exceeded
-------*----------*----------------*--------------*--------------------------------------------------------------------------
role | critical | 287444da0d7738 | 1h29m ago | 500 Internal Server Error
| | | | failed to connect to local node: context deadline exceeded
-------*----------*----------------*--------------*--------------------------------------------------------------------------
vm | critical | 287444da0d7738 | 2h0m ago | 500 Internal Server Error
| | | | [✗] checkDisk: 53.82 MB (5.5%!)(MISSING) free space on /data/ (36.3µs)
| | | | [✓] checkLoad: load averages: 0.00 0.00 0.00 (82.1µs)
| | | | [✓] memory: system spent 0s of the last 60s waiting on memory (38.99µs)
| | | | [✓] cpu: system spent 360ms of the last 60s waiting on cpu (17.74µs)
| | | | [✓] io: system spent 360ms of the last 60s waiting on io (48.57µs)
-------*----------*----------------*--------------*--------------------------------------------------------------------------
fly logs
2024-01-21T02:16:40Z app[287444da0d7738] ewr [info]exporter | ERRO[0024] Error opening connection to database (postgresql://flypgadmin:PASSWORD_REMOVED@[fdaa:5:6ce7:a7b:ce:fd17:60d7:2]:5433/postgres?sslmode=disable): dial tcp [fdaa:5:6ce7:a7b:ce:fd17:60d7:2]:5433: connect: connection refused source="postgres_exporter.go:1658"
2024-01-21T02:16:41Z app[287444da0d7738] ewr [info]keeper | 2024-01-21T02:16:41.956Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2024-01-21T02:16:42Z app[287444da0d7738] ewr [info]sentinel | 2024-01-21T02:16:42.575Z ERROR cmd/sentinel.go:1018 no eligible masters
2024-01-21T02:16:42Z app[287444da0d7738] ewr [info]keeper | 2024-01-21 02:16:42.800 UTC [439] LOG: starting PostgreSQL 14.6 (Debian 14.6-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2024-01-21T02:16:42Z app[287444da0d7738] ewr [info]keeper | 2024-01-21 02:16:42.801 UTC [439] LOG: listening on IPv6 address "fdaa:5:6ce7:a7b:ce:fd17:60d7:2", port 5433
2024-01-21T02:16:42Z app[287444da0d7738] ewr [info]keeper | 2024-01-21 02:16:42.801 UTC [439] LOG: listening on Unix socket "/tmp/.s.PGSQL.5433"
2024-01-21T02:16:42Z app[287444da0d7738] ewr [info]keeper | 2024-01-21 02:16:42.803 UTC [440] LOG: database system shutdown was interrupted; last known up at 2024-01-21 02:16:37 UTC
2024-01-21T02:16:42Z app[287444da0d7738] ewr [info]keeper | 2024-01-21 02:16:42.874 UTC [440] LOG: database system was not properly shut down; automatic recovery in progress
2024-01-21T02:16:42Z app[287444da0d7738] ewr [info]keeper | 2024-01-21 02:16:42.876 UTC [440] LOG: redo starts at 15/D7000028
2024-01-21T02:16:42Z app[287444da0d7738] ewr [info]keeper | 2024-01-21 02:16:42.876 UTC [440] LOG: redo done at 15/D7000110 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2024-01-21T02:16:42Z app[287444da0d7738] ewr [info]keeper | 2024-01-21 02:16:42.884 UTC [440] PANIC: could not write to file "pg_wal/xlogtemp.440": No space left on device
2024-01-21T02:16:42Z app[287444da0d7738] ewr [info]keeper | 2024-01-21 02:16:42.885 UTC [439] LOG: startup process (PID 440) was terminated by signal 6: Aborted
2024-01-21T02:16:42Z app[287444da0d7738] ewr [info]keeper | 2024-01-21 02:16:42.885 UTC [439] LOG: aborting startup due to startup process failure
2024-01-21T02:16:42Z app[287444da0d7738] ewr [info]keeper | 2024-01-21 02:16:42.887 UTC [439] LOG: database system is shut down
2024-01-21T02:16:42Z app[287444da0d7738] ewr [info]error connecting to local postgres context deadline exceeded
2024-01-21T02:16:42Z app[287444da0d7738] ewr [info]checking stolon status
2024-01-21T02:16:42Z app[287444da0d7738] ewr [info]keeper | 2024-01-21T02:16:42.987Z ERROR cmd/keeper.go:1526 failed to start postgres {"error": "postgres exited unexpectedly"}
2024-01-21T02:16:43Z app[287444da0d7738] ewr [info]checking stolon status
2024-01-21T02:16:44Z app[287444da0d7738] ewr [info]keeper | 2024-01-21T02:16:44.457Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2024-01-21T02:16:44Z app[287444da0d7738] ewr [info]checking stolon status
2024-01-21T02:16:45Z app[287444da0d7738] ewr [info]checking stolon status
2024-01-21T02:16:46Z app[287444da0d7738] ewr [info]checking stolon status
2024-01-21T02:16:46Z app[287444da0d7738] ewr [info]keeper | 2024-01-21T02:16:46.958Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
error.message="could not proxy TCP data to/from instance: failed to copy (direction=client->server, error=Transport endpoint is not connected (os error 107))" 2024-01-21T02:16:47Z proxy[287444da0d7738] ewr [error]
2024-01-21T02:16:47Z app[287444da0d7738] ewr [info]checking stolon status
2024-01-21T02:16:47Z app[287444da0d7738] ewr [info]sentinel | 2024-01-21T02:16:47.663Z ERROR cmd/sentinel.go:1018 no eligible masters
2024-01-21T02:16:48Z app[287444da0d7738] ewr [info]keeper | 2024-01-21 02:16:48.049 UTC [472] LOG: starting PostgreSQL 14.6 (Debian 14.6-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2024-01-21T02:16:48Z app[287444da0d7738] ewr [info]keeper | 2024-01-21 02:16:48.049 UTC [472] LOG: listening on IPv6 address "fdaa:5:6ce7:a7b:ce:fd17:60d7:2", port 5433
2024-01-21T02:16:48Z app[287444da0d7738] ewr [info]keeper | 2024-01-21 02:16:48.050 UTC [472] LOG: listening on Unix socket "/tmp/.s.PGSQL.5433"
2024-01-21T02:16:48Z app[287444da0d7738] ewr [info]keeper | 2024-01-21 02:16:48.052 UTC [473] LOG: database system shutdown was interrupted; last known up at 2024-01-21 02:16:42 UTC
2024-01-21T02:16:48Z app[287444da0d7738] ewr [info]keeper | 2024-01-21 02:16:48.111 UTC [473] LOG: database system was not properly shut down; automatic recovery in progress
2024-01-21T02:16:48Z app[287444da0d7738] ewr [info]keeper | 2024-01-21 02:16:48.112 UTC [473] LOG: redo starts at 15/D7000028
2024-01-21T02:16:48Z app[287444da0d7738] ewr [info]keeper | 2024-01-21 02:16:48.112 UTC [473] LOG: redo done at 15/D7000110 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2024-01-21T02:16:48Z app[287444da0d7738] ewr [info]keeper | 2024-01-21 02:16:48.119 UTC [473] PANIC: could not write to file "pg_wal/xlogtemp.473": No space left on device
2024-01-21T02:16:48Z app[287444da0d7738] ewr [info]keeper | 2024-01-21 02:16:48.119 UTC [472] LOG: startup process (PID 473) was terminated by signal 6: Aborted
2024-01-21T02:16:48Z app[287444da0d7738] ewr [info]keeper | 2024-01-21 02:16:48.119 UTC [472] LOG: aborting startup due to startup process failure
2024-01-21T02:16:48Z app[287444da0d7738] ewr [info]keeper | 2024-01-21 02:16:48.121 UTC [472] LOG: database system is shut down
2024-01-21T02:16:48Z app[287444da0d7738] ewr [info]keeper | 2024-01-21T02:16:48.234Z ERROR cmd/keeper.go:1526 failed to start postgres {"error": "postgres exited unexpectedly"}
2024-01-21T02:16:48Z app[287444da0d7738] ewr [info]checking stolon status
error.message="could not proxy TCP data to/from instance: failed to copy (direction=client->server, error=Transport endpoint is not connected (os error 107))" 2024-01-21T02:16:49Z proxy[287444da0d7738] ewr [error]
2024-01-21T02:16:49Z app[287444da0d7738] ewr [info]keeper | 2024-01-21T02:16:49.459Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
error.message="could not proxy TCP data to/from instance: failed to copy (direction=client->server, error=Transport endpoint is not connected (os error 107))" 2024-01-21T02:16:49Z proxy[287444da0d7738] ewr [error]
2024-01-21T02:16:49Z app[287444da0d7738] ewr [info]checking stolon status
2024-01-21T02:16:50Z app[287444da0d7738] ewr [info]checking stolon status
error.message="could not proxy TCP data to/from instance: failed to copy (direction=client->server, error=Transport endpoint is not connected (os error 107))" 2024-01-21T02:16:51Z proxy[287444da0d7738] ewr [error]
2024-01-21T02:16:59Z app[287444da0d7738] ewr [info]keeper | 2024-01-21T02:16:59.463Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2024-01-21T02:16:59Z app[287444da0d7738] ewr [info]checking stolon status
2024-01-21T02:17:00Z proxy[287444da0d7738] ewr [error]could not proxy TCP data to/from instance: failed to copy (direction=client->server, error=Transport endpoint is not connected (os error 107))
2024-01-21T02:17:00Z proxy[287444da0d7738] ewr [error]could not proxy TCP data to/from instance: failed to copy (direction=client->server, error=Transport endpoint is not connected (os error 107))
2024-01-21T02:17:00Z app[287444da0d7738] ewr [info]checking stolon status