Postgres - failed to start VM after service interruption

colinramsay · May 14, 2024, 10:25am

I have a postgres machine refusing to start after a Fly.io service interruption.

The interruption notice (now resolved):

We are performing emergency maintenance on a host some of your apps instances are running on. Apps may be unavailable until the maintenance is completed.

The error when trying to start via flyctl:

Error: could not start machine 6e82535b70e258: failed to start VM 6e82535b70e258: failed_precondition: machine still active, refusing to start (Request ID: 01HXV9QQFV1E66V3PF17PMHAZW-lhr)

Further logs:

2024-05-14T10:21:26.604 app[6e82535b70e258] lhr [info] Starting init (commit: 08b4c2b)…
2024-05-14T10:21:26.623 app[6e82535b70e258] lhr [info] Mounting /dev/vdb at /data w/ uid: 0, gid: 0 and chmod 0755
2024-05-14T10:21:26.628 app[6e82535b70e258] lhr [info] Preparing to run: docker-entrypoint.sh start as root
2024-05-14T10:21:26.640 runner[6e82535b70e258] lhr [info] Machine started in 250ms
2024-05-14T10:21:26.646 app[6e82535b70e258] lhr [info] 2024/05/14 10:21:26 listening on [fdaa:0:4e26:a7b:2809:25dc:1398:2]:22 (DNS: [fdaa::3]:53)
2024-05-14T10:21:26.741 app[6e82535b70e258] lhr [info] cluster spec filename /fly/cluster-spec.json
2024-05-14T10:21:26.743 app[6e82535b70e258] lhr [info] panic: error loading cluster spec: unexpected end of JSON input
2024-05-14T10:21:26.743 app[6e82535b70e258] lhr [info] goroutine 1 [running]:
2024-05-14T10:21:27.637 app[6e82535b70e258] lhr [info] Starting clean up.
2024-05-14T10:21:27.637 app[6e82535b70e258] lhr [info] Umounting /dev/vdb from /data
2024-05-14T10:21:28.642 app[6e82535b70e258] lhr [info] [ 2.155850] reboot: Restarting system

manavo · May 14, 2024, 7:27pm

Having exactly the same happen here as well:

2024-05-14T19:12:14.159 app[5683040b797e38] lhr [info] Starting init (commit: b8364bb)...
2024-05-14T19:12:14.185 app[5683040b797e38] lhr [info] Mounting /dev/vdb at /data w/ uid: 0, gid: 0 and chmod 0755
2024-05-14T19:12:14.191 app[5683040b797e38] lhr [info] Preparing to run: `docker-entrypoint.sh start` as root
2024-05-14T19:12:14.222 app[5683040b797e38] lhr [info] 2024/05/14 19:12:14 listening on [fdaa:0:806b:a7b:2809:babf:db6f:2]:22 (DNS: [fdaa::3]:53)
2024-05-14T19:12:14.243 runner[5683040b797e38] lhr [info] Machine started in 911ms
2024-05-14T19:12:14.317 app[5683040b797e38] lhr [info] cluster spec filename /fly/cluster-spec.json
2024-05-14T19:12:14.319 app[5683040b797e38] lhr [info] panic: error loading cluster spec: unexpected end of JSON input
2024-05-14T19:12:14.319 app[5683040b797e38] lhr [info] goroutine 1 [running]:
2024-05-14T19:12:14.319 app[5683040b797e38] lhr [info] main.main()
2024-05-14T19:12:14.319 app[5683040b797e38] lhr [info] /go/src/github.com/fly-examples/postgres-ha/cmd/start/main.go:69 +0x1bbb
2024-05-14T19:12:15.206 app[5683040b797e38] lhr [info] Starting clean up.
2024-05-14T19:12:15.207 app[5683040b797e38] lhr [info] Umounting /dev/vdb from /data
2024-05-14T19:12:16.211 app[5683040b797e38] lhr [info] [ 2.148485] reboot: Restarting system

colinramsay · May 15, 2024, 8:19am

We ended up forking the volume with the database data on it and creating a new machine. This is probably what Fly would recommend doing but we’ll probably be moving to another provider after this latest incident.

manavo · May 20, 2024, 12:54pm

I couldn’t fork the volume, so I ended up creating a new postgres database based on a volume snapshot.

fly volumes list -a {POSTGRES_APP_NAME} # Find the Volume ID
fly volumes snapshots list {VOLUME_ID} # List snapshots of the Volume
fly postgres create --snapshot-id {VOLUME_SNAPSHOT_ID} # Create a new Postgres from the snapshot

system · May 27, 2024, 12:54pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Postgres machine not properly starting or restarting Questions / Help postgres , volumes	2	57	January 5, 2025
Postgres spontaneously died and cannot be started or restarted	6	664	December 6, 2022
Postgres database down, can't restart instance or machine Questions / Help postgres	10	1112	October 15, 2023
Postgres is down, cannot restart. No active leader found postgres	22	5329	January 15, 2025
Postgres application fails to start Questions / Help postgres	9	1029	September 16, 2023

Postgres - failed to start VM after service interruption

Related topics