Thanks for the help! I’ve verified that we can fork the running Postgres instance, but still haven’t been able to get a snapshot restore to work.
I’ve tried restoring a snapshot to a new volume in the same app (rather than a new app), and have cloned the app’s current machine with the restored volume mounted to /var/lib/postgresql/data
.
The restored volume does show the correct size, however the cloned app fails to start. Any idea why this procedure is failing?
Full logs from a restart of the cloned machine:
2024-12-10T21:49:03.237 app[2867511a9ed328] ord [info] INFO Sending signal SIGINT to main child process w/ PID 359
2024-12-10T21:49:03.237 app[2867511a9ed328] ord [info] Got interrupt, stopping
2024-12-10T21:49:03.237 app[2867511a9ed328] ord [info] supervisor stopping
2024-12-10T21:49:03.237 app[2867511a9ed328] ord [info] exporter | Stopping interrupt...
2024-12-10T21:49:03.237 app[2867511a9ed328] ord [info] keeper | Stopping interrupt...
2024-12-10T21:49:03.237 app[2867511a9ed328] ord [info] proxy | Stopping interrupt...
2024-12-10T21:49:03.238 app[2867511a9ed328] ord [info] sentinel | Stopping interrupt...
2024-12-10T21:49:03.238 app[2867511a9ed328] ord [info] proxy | [WARNING] 344/214903 (377) : Exiting Master process...
2024-12-10T21:49:03.238 app[2867511a9ed328] ord [info] proxy | [NOTICE] 344/214903 (377) : haproxy version is 2.2.9-2+deb11u5
2024-12-10T21:49:03.238 app[2867511a9ed328] ord [info] proxy | [NOTICE] 344/214903 (377) : path to executable is /usr/sbin/haproxy
2024-12-10T21:49:03.238 app[2867511a9ed328] ord [info] proxy | [ALERT] 344/214903 (377) : Current worker #1 (407) exited with code 130 (Interrupt)
2024-12-10T21:49:03.238 app[2867511a9ed328] ord [info] proxy | [WARNING] 344/214903 (377) : All workers exited. Exiting... (130)
2024-12-10T21:49:03.238 app[2867511a9ed328] ord [info] keeper | 2024-12-10T21:49:03.238Z ERROR cmd/keeper.go:1364 failed to resync from followed instance {"error": "sync error: signal: interrupt"}
2024-12-10T21:49:03.238 app[2867511a9ed328] ord [info] exporter | signal: interrupt
2024-12-10T21:49:03.239 app[2867511a9ed328] ord [info] proxy | exit status 130
2024-12-10T21:49:03.240 app[2867511a9ed328] ord [info] sentinel | Process exited 0
2024-12-10T21:49:03.242 app[2867511a9ed328] ord [info] keeper | Process exited 0
2024-12-10T21:49:03.803 app[2867511a9ed328] ord [info] INFO Main child exited normally with code: 0
2024-12-10T21:49:03.819 app[2867511a9ed328] ord [info] INFO Starting clean up.
2024-12-10T21:49:03.820 app[2867511a9ed328] ord [info] INFO Umounting /dev/vdc from /var/lib/postgresql/data
2024-12-10T21:49:03.823 app[2867511a9ed328] ord [info] WARN could not unmount /rootfs: EINVAL: Invalid argument
2024-12-10T21:49:03.824 app[2867511a9ed328] ord [info] [ 579.603836] reboot: Restarting system
2024-12-10T21:49:09.398 health[2867511a9ed328] ord [error] Health check for your postgres vm has failed. Your instance has hit resource limits. Upgrading your instance / volume size or reducing your usage might help.
2024-12-10T21:49:17.325 app[2867511a9ed328] ord [info] 2024-12-10T21:49:17.325375430 [01JES7DJJG0TYFKH7YZ43BHSWW:main] Running Firecracker v1.7.0
2024-12-10T21:49:18.155 app[2867511a9ed328] ord [info] INFO Starting init (commit: 0a21fde6)...
2024-12-10T21:49:18.290 app[2867511a9ed328] ord [info] INFO Checking filesystem on /var/lib/postgresql/data
2024-12-10T21:49:18.297 app[2867511a9ed328] ord [info] /dev/vdc: clean, 12483/32768000 files, 82924761/131067904 blocks
2024-12-10T21:49:18.298 app[2867511a9ed328] ord [info] INFO Mounting /dev/vdc at /var/lib/postgresql/data w/ uid: 0, gid: 0 and chmod 0755
2024-12-10T21:49:18.307 app[2867511a9ed328] ord [info] INFO Resized /var/lib/postgresql/data to 536854134784 bytes
2024-12-10T21:49:18.313 app[2867511a9ed328] ord [info] INFO Preparing to run: `docker-entrypoint.sh start` as root
2024-12-10T21:49:18.314 app[2867511a9ed328] ord [info] INFO [fly api proxy] listening at /.fly/api
2024-12-10T21:49:18.338 runner[2867511a9ed328] ord [info] Machine started in 1.117s
2024-12-10T21:49:18.369 app[2867511a9ed328] ord [info] cluster spec filename /fly/cluster-spec.json
2024-12-10T21:49:18.370 app[2867511a9ed328] ord [info] system memory: 8192mb vcpu count: 4
2024-12-10T21:49:18.370 app[2867511a9ed328] ord [info] {
2024-12-10T21:49:18.370 app[2867511a9ed328] ord [info] "initMode": "new",
2024-12-10T21:49:18.370 app[2867511a9ed328] ord [info] "existingConfig": {},
2024-12-10T21:49:18.370 app[2867511a9ed328] ord [info] "pgParameters": {
2024-12-10T21:49:18.370 app[2867511a9ed328] ord [info] "archive_command": "if [ $ENABLE_WALG ]; then /usr/local/bin/wal-g wal-push \"%p\"; fi",
2024-12-10T21:49:18.370 app[2867511a9ed328] ord [info] "archive_mode": "on",
2024-12-10T21:49:18.370 app[2867511a9ed328] ord [info] "archive_timeout": "60",
2024-12-10T21:49:18.370 app[2867511a9ed328] ord [info] "effective_cache_size": "6144MB",
2024-12-10T21:49:18.370 app[2867511a9ed328] ord [info] "effective_io_concurrency": "200",
2024-12-10T21:49:18.370 app[2867511a9ed328] ord [info] "maintenance_work_mem": "409MB",
2024-12-10T21:49:18.370 app[2867511a9ed328] ord [info] "max_connections": "300",
2024-12-10T21:49:18.370 app[2867511a9ed328] ord [info] "max_parallel_workers": "8",
2024-12-10T21:49:18.370 app[2867511a9ed328] ord [info] "max_parallel_workers_per_gather": "2",
2024-12-10T21:49:18.370 app[2867511a9ed328] ord [info] "max_wal_size": "796MB",
2024-12-10T21:49:18.370 app[2867511a9ed328] ord [info] "max_worker_processes": "8",
2024-12-10T21:49:18.370 app[2867511a9ed328] ord [info] "min_wal_size": "199MB",
2024-12-10T21:49:18.370 app[2867511a9ed328] ord [info] "random_page_cost": "1.1",
2024-12-10T21:49:18.370 app[2867511a9ed328] ord [info] "shared_buffers": "2048MB",
2024-12-10T21:49:18.370 app[2867511a9ed328] ord [info] "wal_compression": "on",
2024-12-10T21:49:18.370 app[2867511a9ed328] ord [info] "wal_keep_size": "0",
2024-12-10T21:49:18.370 app[2867511a9ed328] ord [info] "work_mem": "128MB"
2024-12-10T21:49:18.370 app[2867511a9ed328] ord [info] },
2024-12-10T21:49:18.370 app[2867511a9ed328] ord [info] "maxStandbysPerSender": 50,
2024-12-10T21:49:18.370 app[2867511a9ed328] ord [info] "deadKeeperRemovalInterval": "1h"
2024-12-10T21:49:18.370 app[2867511a9ed328] ord [info] }
2024-12-10T21:49:18.371 app[2867511a9ed328] ord [info] generated new config
2024-12-10T21:49:18.372 app[2867511a9ed328] ord [info] sentinel | Running...
2024-12-10T21:49:18.372 app[2867511a9ed328] ord [info] keeper | Running...
2024-12-10T21:49:18.373 app[2867511a9ed328] ord [info] exporter | Running...
2024-12-10T21:49:18.373 app[2867511a9ed328] ord [info] proxy | Running...
2024-12-10T21:49:18.474 app[2867511a9ed328] ord [info] exporter | INFO[0000] Starting Server: :9187 source="postgres_exporter.go:1837"
2024-12-10T21:49:18.483 app[2867511a9ed328] ord [info] 2024/12/10 21:49:18 INFO SSH listening listen_address=[fdaa:0:51de:a7b:2a9:4579:69ca:2]:22 dns_server=[fdaa::3]:53
2024-12-10T21:49:18.494 app[2867511a9ed328] ord [info] proxy | [NOTICE] 344/214918 (383) : New worker #1 (409) forked
2024-12-10T21:49:18.497 app[2867511a9ed328] ord [info] proxy | [WARNING] 344/214918 (409) : bk_db/pg1 changed its IP from (none) to fdaa:0:51de:a7b:2a9:4579:69ca:2 by flydns/dns1.
2024-12-10T21:49:18.497 app[2867511a9ed328] ord [info] proxy | [WARNING] 344/214918 (409) : Server bk_db/pg1 ('ord.<appnameremoved>.internal') is UP/READY (resolves again).
2024-12-10T21:49:18.497 app[2867511a9ed328] ord [info] proxy | [WARNING] 344/214918 (409) : Server bk_db/pg1 administratively READY thanks to valid DNS answer.
2024-12-10T21:49:18.920 app[2867511a9ed328] ord [info] sentinel | 2024-12-10T21:49:18.920Z ERROR cmd/sentinel.go:1018 no eligible masters
2024-12-10T21:49:19.372 app[2867511a9ed328] ord [info] checking stolon status
2024-12-10T21:49:20.372 app[2867511a9ed328] ord [info] checking stolon status
2024-12-10T21:49:21.372 app[2867511a9ed328] ord [info] checking stolon status
2024-12-10T21:49:22.373 app[2867511a9ed328] ord [info] checking stolon status
2024-12-10T21:49:23.372 app[2867511a9ed328] ord [info] checking stolon status
2024-12-10T21:49:24.148 app[2867511a9ed328] ord [info] sentinel | 2024-12-10T21:49:24.147Z WARN cmd/sentinel.go:287 received db state for unexpected db uid {"receivedDB": "", "db": "3a41e2b1", "keeper": "2a9457969ca2"}
2024-12-10T21:49:24.148 app[2867511a9ed328] ord [info] sentinel | 2024-12-10T21:49:24.147Z WARN cmd/sentinel.go:276 no keeper info available {"db": "8f85aefd", "keeper": "111e5b77ebc2"}
2024-12-10T21:49:24.372 app[2867511a9ed328] ord [info] checking stolon status
2024-12-10T21:49:24.468 app[2867511a9ed328] ord [info] keeper is healthy, db is healthy, role: standby
2024-12-10T21:49:25.307 app[2867511a9ed328] ord [info] proxy | [WARNING] 344/214925 (409) : Backup Server bk_db/pg is DOWN, reason: Layer7 timeout, check duration: 5000ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2024-12-10T21:49:25.489 app[2867511a9ed328] ord [info] proxy | [WARNING] 344/214925 (409) : Server bk_db/pg1 is DOWN, reason: Layer7 timeout, check duration: 5000ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2024-12-10T21:49:25.489 app[2867511a9ed328] ord [info] proxy | [NOTICE] 344/214925 (409) : haproxy version is 2.2.9-2+deb11u5
2024-12-10T21:49:25.489 app[2867511a9ed328] ord [info] proxy | [NOTICE] 344/214925 (409) : path to executable is /usr/sbin/haproxy
2024-12-10T21:49:25.489 app[2867511a9ed328] ord [info] proxy | [ALERT] 344/214925 (409) : backend 'bk_db' has no server available!
2024-12-10T21:49:29.396 app[2867511a9ed328] ord [info] sentinel | 2024-12-10T21:49:29.395Z WARN cmd/sentinel.go:287 received db state for unexpected db uid {"receivedDB": "", "db": "3a41e2b1", "keeper": "2a9457969ca2"}
2024-12-10T21:49:29.396 app[2867511a9ed328] ord [info] sentinel | 2024-12-10T21:49:29.395Z WARN cmd/sentinel.go:276 no keeper info available {"db": "8f85aefd", "keeper": "111e5b77ebc2"}
2024-12-10T21:49:29.416 health[2867511a9ed328] ord [info] Health check for your postgres vm is now passing.
2024-12-10T21:49:34.611 app[2867511a9ed328] ord [info] sentinel | 2024-12-10T21:49:34.611Z WARN cmd/sentinel.go:276 no keeper info available {"db": "8f85aefd", "keeper": "111e5b77ebc2"}
2024-12-10T21:49:39.848 app[2867511a9ed328] ord [info] sentinel | 2024-12-10T21:49:39.848Z WARN cmd/sentinel.go:276 no keeper info available {"db": "8f85aefd", "keeper": "111e5b77ebc2"}
2024-12-10T21:49:41.922 app[2867511a9ed328] ord [info] exporter | INFO[0023] Established new database connection to "fdaa:0:51de:a7b:2a9:4579:69ca:2:5433". source="postgres_exporter.go:970"
2024-12-10T21:49:42.922 app[2867511a9ed328] ord [info] exporter | ERRO[0024] Error opening connection to database (postgresql://flypgadmin:PASSWORD_REMOVED@[fdaa:0:51de:a7b:2a9:4579:69ca:2]:5433/postgres?sslmode=disable): dial tcp [fdaa:0:51de:a7b:2a9:4579:69ca:2]:5433: connect: connection refused source="postgres_exporter.go:1658"