A few hours ago my database app on Fly started failing again, but this time with a different error.
Here’s the output of fly status
:
App
Name = urbanave-db
Owner = urban-avenue
Version = 10
Status = running
Hostname = urbanave-db.fly.dev
Instances
ID PROCESS VERSION REGION DESIRED STATUS HEALTH CHECKS RESTARTS CREATED
308b15e8 app 10 dfw run running (readonly) 3 total, 1 passing, 2 critical 0 2022-06-30T21:04:47Z
I tried restarting and scaling down and back up as mentioned in my previous post but that did not work this time, and the VM status is always running (readonly)
with 2 critical health checks.
Here’s the output of the vm status <instance>
after restarting:
▲ fly vm status 2751e7ee -a urbanave-db
Instance
ID = 2751e7ee
Process =
Version = 15
Region = dfw
Desired = run
Status = running (replica)
Health Checks = 3 total, 1 passing, 2 critical
Restarts = 0
Created = 2m43s ago
Recent Events
TIMESTAMP TYPE MESSAGE
2022-07-18T21:37:36Z Received Task received by client
2022-07-18T21:37:36Z Task Setup Building Task Directory
2022-07-18T21:37:40Z Started Task started by client
Checks
ID SERVICE STATE OUTPUT
vm app critical HTTP GET http://172.19.20.130:5500/flycheck/vm: 500 Internal Server Error Output: "[✓] checkDisk: 7.95 GB (81.3%!)(MISSING) free space on /data/ (49.24µs)\n[✓] checkLoad: load averages: 0.02 0.07 0.28 (135.87µs)\n[✓] memory: system spent 0s of the last 60s waiting on memory (165.06µs)\n[✗] cpu: system spent 1.54s of the last 10 seconds waiting on cpu (74.61µs)\n[✓] io: system spent 1.08s of the last 60s waiting on io (29.47µs)"
role app passing replica
pg app critical HTTP GET http://172.19.20.130:5500/flycheck/pg: 500 Internal Server Error Output: "failed to connect to proxy: context deadline exceeded"
Recent Logs
2022-07-18T21:37:46.000 [info] proxy | [WARNING] 198/213746 (566) : Backup Server bk_db/pg is DOWN, reason: Layer7 timeout, check duration: 5001ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2022-07-18T21:37:47.000 [info] checking stolon status
2022-07-18T21:37:47.000 [info] proxy | [WARNING] 198/213747 (566) : Server bk_db/pg2 is DOWN, reason: Layer7 timeout, check duration: 5000ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2022-07-18T21:37:47.000 [info] keeper | 2022-07-18T21:37:47.444Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2022-07-18T21:37:48.000 [info] checking stolon status
2022-07-18T21:37:48.000 [info] keeper is healthy, db is healthy, role: standby
2022-07-18T21:37:49.000 [info] proxy | [WARNING] 198/213749 (566) : Server bk_db/pg1 is DOWN, reason: Layer7 timeout, check duration: 5001ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2022-07-18T21:37:49.000 [info] proxy | [NOTICE] 198/213749 (566) : haproxy version is 2.2.9-2+deb11u3
2022-07-18T21:37:49.000 [info] proxy | [NOTICE] 198/213749 (566) : path to executable is /usr/sbin/haproxy
2022-07-18T21:37:49.000 [info] proxy | [ALERT] 198/213749 (566) : backend 'bk_db' has no server available!
2022-07-18T21:37:49.000 [info] keeper | 2022-07-18T21:37:49.944Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2022-07-18T21:37:52.000 [info] keeper | 2022-07-18T21:37:52.448Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2022-07-18T21:37:52.000 [info] keeper | 2022-07-18 21:37:52.753 UTC [619] LOG: starting PostgreSQL 14.4 (Debian 14.4-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2022-07-18T21:37:52.000 [info] keeper | 2022-07-18 21:37:52.754 UTC [619] LOG: listening on IPv6 address "fdaa:0:544a:a7b:1a:1:1638:2", port 5433
2022-07-18T21:37:52.000 [info] keeper | 2022-07-18 21:37:52.754 UTC [619] LOG: listening on Unix socket "/tmp/.s.PGSQL.5433"
2022-07-18T21:37:52.000 [info] keeper | 2022-07-18 21:37:52.757 UTC [620] LOG: database system was shut down in recovery at 2022-06-30 21:03:52 UTC
2022-07-18T21:37:52.000 [info] keeper | 2022-07-18 21:37:52.758 UTC [620] LOG: entering standby mode
2022-07-18T21:37:52.000 [info] keeper | 2022-07-18 21:37:52.761 UTC [620] LOG: redo starts at 1/A1254378
2022-07-18T21:37:52.000 [info] keeper | 2022-07-18 21:37:52.827 UTC [620] LOG: consistent recovery state reached at 1/A1947F58
2022-07-18T21:37:52.000 [info] keeper | 2022-07-18 21:37:52.828 UTC [619] LOG: database system is ready to accept read-only connections
2022-07-18T21:38:02.000 [info] exporter | INFO[0022] Established new database connection to "fdaa:0:544a:a7b:1a:1:1638:2:5433". source="postgres_exporter.go:970"
2022-07-18T21:38:02.000 [info] exporter | INFO[0022] Semantic Version Changed on "fdaa:0:544a:a7b:1a:1:1638:2:5433": 0.0.0 -> 14.4.0 source="postgres_exporter.go:1539"
2022-07-18T21:38:02.000 [info] exporter | INFO[0022] Established new database connection to "fdaa:0:544a:a7b:1a:1:1638:2:5433". source="postgres_exporter.go:970"
2022-07-18T21:38:02.000 [info] exporter | INFO[0022] Semantic Version Changed on "fdaa:0:544a:a7b:1a:1:1638:2:5433": 0.0.0 -> 14.4.0 source="postgres_exporter.go:1539"
2022-07-18T21:38:10.000 [info] keeper | 2022-07-18 21:38:10.571 UTC [745] LOG: started streaming WAL from primary at 2/5A000000 on timeline 29