Fly Singapore Database randomly died

My Fly singapore database randomly died today. I have no clue why it died beyond these logs. Restarting it won’t work, it’d just always die.

What does this mean?

2024-11-27T05:41:01.563 app[] sin [info] INFO Starting init (commit: 3dd5d9e)...
2024-11-27T05:41:01.564 app[] sin [info] [ 0.567490] EXT4-fs (vda): VFS: Found ext4 filesystem with invalid superblock checksum. Run e2fsck?
2024-11-27T05:41:01.565 app[] sin [info] ERROR Error: couldn't mount /dev/vda onto /lower/dev/vda, because: EBADMSG: Not a data message
2024-11-27T05:41:01.566 app[] sin [info] [ 0.569576] reboot: Restarting system
2024-11-27T05:41:01.624 app[] sin [warn] Virtual machine exited abruptly
 2024-11-27T05:47:44.400 app[] sin [info] INFO Starting init (commit: 3dd5d9e)...
2024-11-27T05:47:44.415 app[] sin [info] INFO Mounting /dev/vdc at /data w/ uid: 0, gid: 0 and chmod 0755
2024-11-27T05:47:44.418 app[] sin [info] INFO Resized /data to 1073741824 bytes
2024-11-27T05:47:44.419 app[] sin [info] INFO Preparing to run: `docker-entrypoint.sh start` as root
2024-11-27T05:47:44.421 app[] sin [info] ERROR Error: failed to spawn command: docker-entrypoint.sh start: No such file or directory (os error 2)
2024-11-27T05:47:44.422 app[] sin [info] does `docker-entrypoint.sh` exist and is it executable?
2024-11-27T05:47:44.422 app[] sin [info] [ 0.592613] reboot: Restarting system
2024-11-27T05:47:44.498 app[] sin [warn] Virtual machine exited abruptly 

When I go further back, it says the cluster or membership is incorrect??? No database values have been touched at all.

Focus on this machine
Health check for your postgres database has failed. Your database is malfunctioning.

Nov 27, 2024 04:15 UTC

Health check for your postgres role has failed. Your cluster's membership is inconsistent.

Nov 27, 2024 04:15 UTC
Health check for your postgres vm has failed. Your instance has hit 
resource limits. Upgrading your instance / volume size or reducing your 
usage might help.

Hi… This looks like low-level disk corruption, unfortunately. (Yikes!)

Is this a single-node instance?


The Fly.io platform automatically takes daily disk snapshots, but they’re only retained for a few days, by default.

Yeah single node instance. Hmmm, I’ll try restoring from the disk snapshot.

Hmm, low level disk corruption, oh boy.

I just had the same problem and my database is also located in Singapore. Followed the restore guide @mayailurus linked above to restore my volume (the latest one was 14 hours ago in my case) but failed with errors (something like “repmr database doesn’t exist” and 'collation version mismatch"). I managed to fix the restore problem by using the previous image version from my current image version (v15.2 instead of v15.3). Which makes me think that my problem is because Fly tried to upgrade the version of postgres to a broken image, because it was v15.2 after the snapshot being taken (less than 14 hours timeframe) before my problem happened.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.