Fly Postgres instance is unresponsive and unhealthy

Last night our Fly Postgres instance stopped responding to the backend and started reporting failing health checks.

I have tried:

  • restarting the instance

  • scaling the instance CPU

  • scaling the instance memory

Any suggestions on how to resolve without recreating the instance?

Attempt to connect from local machine:


fly pg connect -a <app-name>-dev

Connecting to <app-name>.internal... complete

psql: error: connection to server at "<app-name>.internal" (fdaa:0:77a4:a7b:2c00:1:6e47:2), port 5432 failed: server closed the connection unexpectedly

This probably means the server terminated abnormally

before or while processing the request.

Logs:


2022-08-11T17:08:36.216 runner[0dc3f49b] mia [info] Starting instance

2022-08-11T17:08:36.398 runner[0dc3f49b] mia [info] Configuring virtual machine

2022-08-11T17:08:36.400 runner[0dc3f49b] mia [info] Pulling container image

2022-08-11T17:08:46.814 runner[0dc3f49b] mia [info] Unpacking image

2022-08-11T17:08:46.829 runner[0dc3f49b] mia [info] Preparing kernel init

2022-08-11T17:08:46.968 runner[0dc3f49b] mia [info] Setting up volume 'pg_data'

2022-08-11T17:08:47.095 runner[0dc3f49b] mia [info] Configuring firecracker

2022-08-11T17:08:47.132 runner[0dc3f49b] mia [info] Starting virtual machine

2022-08-11T17:08:47.353 app[0dc3f49b] mia [info] Starting init (commit: c86b3dc)...

2022-08-11T17:08:47.373 app[0dc3f49b] mia [info] Mounting /dev/vdc at /data w/ uid: 0, gid: 0 and chmod 0755

2022-08-11T17:08:47.388 app[0dc3f49b] mia [info] Preparing to run: `docker-entrypoint.sh start` as root

2022-08-11T17:08:47.417 app[0dc3f49b] mia [info] 2022/08/11 17:08:47 listening on [fdaa:0:77a4:a7b:2c00:1:6e47:2]:22 (DNS: [fdaa::3]:53)

2022-08-11T17:08:47.511 app[0dc3f49b] mia [info] cluster spec filename /fly/cluster-spec.json

2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] system memory: 2048mb vcpu count: 1

2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] {

2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] "initMode": "existing",

2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] "existingConfig": {

2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] "keeperUID": "2c0016e472"

2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] },

2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] "pgParameters": {

2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] "effective_cache_size": "1536MB",

2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] "effective_io_concurrency": "200",

2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] "maintenance_work_mem": "102MB",

2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] "max_connections": "300",

2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] "max_parallel_workers": "8",

2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] "max_parallel_workers_per_gather": "2",

2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] "max_worker_processes": "8",

2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] "random_page_cost": "1.1",

2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] "shared_buffers": "512MB",

2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] "wal_compression": "on",

2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] "work_mem": "32MB"

2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] },

2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] "maxStandbysPerSender": 50

2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] }

2022-08-11T17:08:47.512 app[0dc3f49b] mia [info] generated new config

2022-08-11T17:08:47.514 app[0dc3f49b] mia [info] keeper | Running...

2022-08-11T17:08:47.518 app[0dc3f49b] mia [info] proxy | Running...

2022-08-11T17:08:47.521 app[0dc3f49b] mia [info] exporter | Running...

2022-08-11T17:08:47.523 app[0dc3f49b] mia [info] sentinel | Running...

2022-08-11T17:08:47.674 app[0dc3f49b] mia [info] exporter | INFO[0000] Starting Server: :9187 source="postgres_exporter.go:1837"

2022-08-11T17:08:47.704 app[0dc3f49b] mia [info] proxy | [WARNING] 222/170847 (539) : parsing [/fly/haproxy.cfg:38]: Missing LF on last line, file might have been truncated at position 96. This will become a hard error in HAProxy 2.3.

2022-08-11T17:08:47.779 app[0dc3f49b] mia [info] proxy | [NOTICE] 222/170847 (539) : New worker #1 (565) forked

2022-08-11T17:08:48.513 app[0dc3f49b] mia [info] checking stolon status

2022-08-11T17:08:49.825 app[0dc3f49b] mia [info] proxy | [WARNING] 222/170849 (565) : bk_db/pg1 changed its IP from (none) to fdaa:0:77a4:a7b:2c00:1:6e47:2 by flydns/dns1.

2022-08-11T17:08:49.825 app[0dc3f49b] mia [info] proxy | [WARNING] 222/170849 (565) : Server bk_db/pg1 ('mia.<app-name>.internal') is UP/READY (resolves again).

2022-08-11T17:08:49.825 app[0dc3f49b] mia [info] proxy | [WARNING] 222/170849 (565) : Server bk_db/pg1 administratively READY thanks to valid DNS answer.

2022-08-11T17:08:50.080 app[0dc3f49b] mia [info] keeper is healthy, db is healthy, role: master

2022-08-11T17:08:50.087 app[0dc3f49b] mia [info] keeper | 2022-08-11T17:08:50.086Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}

2022-08-11T17:08:52.588 app[0dc3f49b] mia [info] keeper | 2022-08-11T17:08:52.587Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}

2022-08-11T17:08:54.524 app[0dc3f49b] mia [info] proxy | [WARNING] 222/170854 (565) : Backup Server bk_db/pg is DOWN, reason: Layer7 timeout, check duration: 5001ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.

2022-08-11T17:08:55.080 app[0dc3f49b] mia [info] error connecting to local postgres context deadline exceeded

2022-08-11T17:08:55.080 app[0dc3f49b] mia [info] checking stolon status

2022-08-11T17:08:55.088 app[0dc3f49b] mia [info] keeper | 2022-08-11T17:08:55.087Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}

2022-08-11T17:08:55.181 app[0dc3f49b] mia [info] keeper is healthy, db is healthy, role: master

2022-08-11T17:08:56.706 app[0dc3f49b] mia [info] proxy | [WARNING] 222/170856 (565) : Server bk_db/pg1 is DOWN, reason: Layer7 timeout, check duration: 5000ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.

2022-08-11T17:08:56.706 app[0dc3f49b] mia [info] proxy | [NOTICE] 222/170856 (565) : haproxy version is 2.2.9-2+deb11u3

2022-08-11T17:08:56.706 app[0dc3f49b] mia [info] proxy | [NOTICE] 222/170856 (565) : path to executable is /usr/sbin/haproxy

2022-08-11T17:08:56.706 app[0dc3f49b] mia [info] proxy | [ALERT] 222/170856 (565) : backend 'bk_db' has no server available!

2022-08-11T17:08:57.589 app[0dc3f49b] mia [info] keeper | 2022-08-11T17:08:57.588Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}

2022-08-11T17:09:00.089 app[0dc3f49b] mia [info] keeper | 2022-08-11T17:09:00.089Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}

2022-08-11T17:09:00.181 app[0dc3f49b] mia [info] error connecting to local postgres context deadline exceeded

2022-08-11T17:09:00.182 app[0dc3f49b] mia [info] checking stolon status

2022-08-11T17:09:00.846 app[0dc3f49b] mia [info] keeper | 2022-08-11 17:09:00.845 UTC [599] LOG: starting PostgreSQL 14.4 (Debian 14.4-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit

2022-08-11T17:09:00.846 app[0dc3f49b] mia [info] keeper | 2022-08-11 17:09:00.845 UTC [599] LOG: listening on IPv6 address "fdaa:0:77a4:a7b:2c00:1:6e47:2", port 5433

2022-08-11T17:09:00.847 app[0dc3f49b] mia [info] keeper | 2022-08-11 17:09:00.846 UTC [599] LOG: listening on Unix socket "/tmp/.s.PGSQL.5433"

2022-08-11T17:09:00.850 app[0dc3f49b] mia [info] keeper | 2022-08-11 17:09:00.849 UTC [600] LOG: database system was shut down at 2022-08-11 17:08:32 UTC

2022-08-11T17:09:00.855 app[0dc3f49b] mia [info] keeper | 2022-08-11 17:09:00.854 UTC [599] LOG: database system is ready to accept connections

2022-08-11T17:09:01.527 app[0dc3f49b] mia [info] keeper is healthy, db is healthy, role: master

2022-08-11T17:09:01.550 app[0dc3f49b] mia [info] configuring operator

2022-08-11T17:09:01.567 app[0dc3f49b] mia [info] operator password does not match config, changing

2022-08-11T17:09:01.571 app[0dc3f49b] mia [info] operator ready!

2022-08-11T17:09:01.571 app[0dc3f49b] mia [info] configuring repluser

2022-08-11T17:09:01.572 app[0dc3f49b] mia [info] repluser password does not match config, changing

2022-08-11T17:09:01.575 app[0dc3f49b] mia [info] replication ready!

2022-08-11T17:09:03.107 app[0dc3f49b] mia [info] exporter | INFO[0015] Established new database connection to "fdaa:0:77a4:a7b:2c00:1:6e47:2:5433". source="postgres_exporter.go:970"

2022-08-11T17:09:03.123 app[0dc3f49b] mia [info] exporter | INFO[0015] Semantic Version Changed on "fdaa:0:77a4:a7b:2c00:1:6e47:2:5433": 0.0.0 -> 14.4.0 source="postgres_exporter.go:1539"

2022-08-11T17:09:03.149 app[0dc3f49b] mia [info] exporter | INFO[0015] Established new database connection to "fdaa:0:77a4:a7b:2c00:1:6e47:2:5433". source="postgres_exporter.go:970"

2022-08-11T17:09:03.179 app[0dc3f49b] mia [info] exporter | INFO[0015] Semantic Version Changed on "fdaa:0:77a4:a7b:2c00:1:6e47:2:5433": 0.0.0 -> 14.4.0 source="postgres_exporter.go:1539"

Hey there,

It appears you’ve run out of disk space. You should be able to see the failing check by running:

fly checks list --app <app-name>

You can try resizing your volume using the fly volumes extend feature: Volume expansion is now available!

1 Like