Last night our Fly Postgres instance stopped responding to the backend and started reporting failing health checks.
I have tried:
-
restarting the instance
-
scaling the instance CPU
-
scaling the instance memory
Any suggestions on how to resolve without recreating the instance?
Attempt to connect from local machine:
fly pg connect -a <app-name>-dev
Connecting to <app-name>.internal... complete
psql: error: connection to server at "<app-name>.internal" (fdaa:0:77a4:a7b:2c00:1:6e47:2), port 5432 failed: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Logs:
2022-08-11T17:08:36.216 runner[0dc3f49b] mia [info] Starting instance
2022-08-11T17:08:36.398 runner[0dc3f49b] mia [info] Configuring virtual machine
2022-08-11T17:08:36.400 runner[0dc3f49b] mia [info] Pulling container image
2022-08-11T17:08:46.814 runner[0dc3f49b] mia [info] Unpacking image
2022-08-11T17:08:46.829 runner[0dc3f49b] mia [info] Preparing kernel init
2022-08-11T17:08:46.968 runner[0dc3f49b] mia [info] Setting up volume 'pg_data'
2022-08-11T17:08:47.095 runner[0dc3f49b] mia [info] Configuring firecracker
2022-08-11T17:08:47.132 runner[0dc3f49b] mia [info] Starting virtual machine
2022-08-11T17:08:47.353 app[0dc3f49b] mia [info] Starting init (commit: c86b3dc)...
2022-08-11T17:08:47.373 app[0dc3f49b] mia [info] Mounting /dev/vdc at /data w/ uid: 0, gid: 0 and chmod 0755
2022-08-11T17:08:47.388 app[0dc3f49b] mia [info] Preparing to run: `docker-entrypoint.sh start` as root
2022-08-11T17:08:47.417 app[0dc3f49b] mia [info] 2022/08/11 17:08:47 listening on [fdaa:0:77a4:a7b:2c00:1:6e47:2]:22 (DNS: [fdaa::3]:53)
2022-08-11T17:08:47.511 app[0dc3f49b] mia [info] cluster spec filename /fly/cluster-spec.json
2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] system memory: 2048mb vcpu count: 1
2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] {
2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] "initMode": "existing",
2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] "existingConfig": {
2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] "keeperUID": "2c0016e472"
2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] },
2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] "pgParameters": {
2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] "effective_cache_size": "1536MB",
2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] "effective_io_concurrency": "200",
2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] "maintenance_work_mem": "102MB",
2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] "max_connections": "300",
2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] "max_parallel_workers": "8",
2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] "max_parallel_workers_per_gather": "2",
2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] "max_worker_processes": "8",
2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] "random_page_cost": "1.1",
2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] "shared_buffers": "512MB",
2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] "wal_compression": "on",
2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] "work_mem": "32MB"
2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] },
2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] "maxStandbysPerSender": 50
2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] }
2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] generated new config
2022-08-11T17:08:47.514 app[0dc3f49b] mia [info] keeper | Running...
2022-08-11T17:08:47.518 app[0dc3f49b] mia [info] proxy | Running...
2022-08-11T17:08:47.521 app[0dc3f49b] mia [info] exporter | Running...
2022-08-11T17:08:47.523 app[0dc3f49b] mia [info] sentinel | Running...
2022-08-11T17:08:47.674 app[0dc3f49b] mia [info] exporter | INFO[0000] Starting Server: :9187 source="postgres_exporter.go:1837"
2022-08-11T17:08:47.704 app[0dc3f49b] mia [info] proxy | [WARNING] 222/170847 (539) : parsing [/fly/haproxy.cfg:38]: Missing LF on last line, file might have been truncated at position 96. This will become a hard error in HAProxy 2.3.
2022-08-11T17:08:47.779 app[0dc3f49b] mia [info] proxy | [NOTICE] 222/170847 (539) : New worker #1 (565) forked
2022-08-11T17:08:48.513 app[0dc3f49b] mia [info] checking stolon status
2022-08-11T17:08:49.825 app[0dc3f49b] mia [info] proxy | [WARNING] 222/170849 (565) : bk_db/pg1 changed its IP from (none) to fdaa:0:77a4:a7b:2c00:1:6e47:2 by flydns/dns1.
2022-08-11T17:08:49.825 app[0dc3f49b] mia [info] proxy | [WARNING] 222/170849 (565) : Server bk_db/pg1 ('mia.<app-name>.internal') is UP/READY (resolves again).
2022-08-11T17:08:49.825 app[0dc3f49b] mia [info] proxy | [WARNING] 222/170849 (565) : Server bk_db/pg1 administratively READY thanks to valid DNS answer.
2022-08-11T17:08:50.080 app[0dc3f49b] mia [info] keeper is healthy, db is healthy, role: master
2022-08-11T17:08:50.087 app[0dc3f49b] mia [info] keeper | 2022-08-11T17:08:50.086Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2022-08-11T17:08:52.588 app[0dc3f49b] mia [info] keeper | 2022-08-11T17:08:52.587Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2022-08-11T17:08:54.524 app[0dc3f49b] mia [info] proxy | [WARNING] 222/170854 (565) : Backup Server bk_db/pg is DOWN, reason: Layer7 timeout, check duration: 5001ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2022-08-11T17:08:55.080 app[0dc3f49b] mia [info] error connecting to local postgres context deadline exceeded
2022-08-11T17:08:55.080 app[0dc3f49b] mia [info] checking stolon status
2022-08-11T17:08:55.088 app[0dc3f49b] mia [info] keeper | 2022-08-11T17:08:55.087Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2022-08-11T17:08:55.181 app[0dc3f49b] mia [info] keeper is healthy, db is healthy, role: master
2022-08-11T17:08:56.706 app[0dc3f49b] mia [info] proxy | [WARNING] 222/170856 (565) : Server bk_db/pg1 is DOWN, reason: Layer7 timeout, check duration: 5000ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2022-08-11T17:08:56.706 app[0dc3f49b] mia [info] proxy | [NOTICE] 222/170856 (565) : haproxy version is 2.2.9-2+deb11u3
2022-08-11T17:08:56.706 app[0dc3f49b] mia [info] proxy | [NOTICE] 222/170856 (565) : path to executable is /usr/sbin/haproxy
2022-08-11T17:08:56.706 app[0dc3f49b] mia [info] proxy | [ALERT] 222/170856 (565) : backend 'bk_db' has no server available!
2022-08-11T17:08:57.589 app[0dc3f49b] mia [info] keeper | 2022-08-11T17:08:57.588Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2022-08-11T17:09:00.089 app[0dc3f49b] mia [info] keeper | 2022-08-11T17:09:00.089Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2022-08-11T17:09:00.181 app[0dc3f49b] mia [info] error connecting to local postgres context deadline exceeded
2022-08-11T17:09:00.182 app[0dc3f49b] mia [info] checking stolon status
2022-08-11T17:09:00.846 app[0dc3f49b] mia [info] keeper | 2022-08-11 17:09:00.845 UTC [599] LOG: starting PostgreSQL 14.4 (Debian 14.4-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2022-08-11T17:09:00.846 app[0dc3f49b] mia [info] keeper | 2022-08-11 17:09:00.845 UTC [599] LOG: listening on IPv6 address "fdaa:0:77a4:a7b:2c00:1:6e47:2", port 5433
2022-08-11T17:09:00.847 app[0dc3f49b] mia [info] keeper | 2022-08-11 17:09:00.846 UTC [599] LOG: listening on Unix socket "/tmp/.s.PGSQL.5433"
2022-08-11T17:09:00.850 app[0dc3f49b] mia [info] keeper | 2022-08-11 17:09:00.849 UTC [600] LOG: database system was shut down at 2022-08-11 17:08:32 UTC
2022-08-11T17:09:00.855 app[0dc3f49b] mia [info] keeper | 2022-08-11 17:09:00.854 UTC [599] LOG: database system is ready to accept connections
2022-08-11T17:09:01.527 app[0dc3f49b] mia [info] keeper is healthy, db is healthy, role: master
2022-08-11T17:09:01.550 app[0dc3f49b] mia [info] configuring operator
2022-08-11T17:09:01.567 app[0dc3f49b] mia [info] operator password does not match config, changing
2022-08-11T17:09:01.571 app[0dc3f49b] mia [info] operator ready!
2022-08-11T17:09:01.571 app[0dc3f49b] mia [info] configuring repluser
2022-08-11T17:09:01.572 app[0dc3f49b] mia [info] repluser password does not match config, changing
2022-08-11T17:09:01.575 app[0dc3f49b] mia [info] replication ready!
2022-08-11T17:09:03.107 app[0dc3f49b] mia [info] exporter | INFO[0015] Established new database connection to "fdaa:0:77a4:a7b:2c00:1:6e47:2:5433". source="postgres_exporter.go:970"
2022-08-11T17:09:03.123 app[0dc3f49b] mia [info] exporter | INFO[0015] Semantic Version Changed on "fdaa:0:77a4:a7b:2c00:1:6e47:2:5433": 0.0.0 -> 14.4.0 source="postgres_exporter.go:1539"
2022-08-11T17:09:03.149 app[0dc3f49b] mia [info] exporter | INFO[0015] Established new database connection to "fdaa:0:77a4:a7b:2c00:1:6e47:2:5433". source="postgres_exporter.go:970"
2022-08-11T17:09:03.179 app[0dc3f49b] mia [info] exporter | INFO[0015] Semantic Version Changed on "fdaa:0:77a4:a7b:2c00:1:6e47:2:5433": 0.0.0 -> 14.4.0 source="postgres_exporter.go:1539"