I have a postgres machine refusing to start after a Fly.io service interruption.
The interruption notice (now resolved):
We are performing emergency maintenance on a host some of your apps instances are running on. Apps may be unavailable until the maintenance is completed.
The error when trying to start via flyctl:
Error: could not start machine 6e82535b70e258: failed to start VM 6e82535b70e258: failed_precondition: machine still active, refusing to start (Request ID: 01HXV9QQFV1E66V3PF17PMHAZW-lhr)
Further logs:
2024-05-14T10:21:26.604 app[6e82535b70e258] lhr [info] Starting init (commit: 08b4c2b)…
2024-05-14T10:21:26.623 app[6e82535b70e258] lhr [info] Mounting /dev/vdb at /data w/ uid: 0, gid: 0 and chmod 0755
2024-05-14T10:21:26.628 app[6e82535b70e258] lhr [info] Preparing to run: docker-entrypoint.sh start as root
2024-05-14T10:21:26.640 runner[6e82535b70e258] lhr [info] Machine started in 250ms
2024-05-14T10:21:26.646 app[6e82535b70e258] lhr [info] 2024/05/14 10:21:26 listening on [fdaa:0:4e26:a7b:2809:25dc:1398:2]:22 (DNS: [fdaa::3]:53)
2024-05-14T10:21:26.741 app[6e82535b70e258] lhr [info] cluster spec filename /fly/cluster-spec.json
2024-05-14T10:21:26.743 app[6e82535b70e258] lhr [info] panic: error loading cluster spec: unexpected end of JSON input
2024-05-14T10:21:26.743 app[6e82535b70e258] lhr [info] goroutine 1 [running]:
2024-05-14T10:21:27.637 app[6e82535b70e258] lhr [info] Starting clean up.
2024-05-14T10:21:27.637 app[6e82535b70e258] lhr [info] Umounting /dev/vdb from /data
2024-05-14T10:21:28.642 app[6e82535b70e258] lhr [info] [ 2.155850] reboot: Restarting system
We ended up forking the volume with the database data on it and creating a new machine. This is probably what Fly would recommend doing but we’ll probably be moving to another provider after this latest incident.
I couldn’t fork the volume, so I ended up creating a new postgres database based on a volume snapshot.
fly volumes list -a {POSTGRES_APP_NAME} # Find the Volume ID
fly volumes snapshots list {VOLUME_ID} # List snapshots of the Volume
fly postgres create --snapshot-id {VOLUME_SNAPSHOT_ID} # Create a new Postgres from the snapshot