Fly Singapore Database randomly died

testtester · November 27, 2024, 5:47am

My Fly singapore database randomly died today. I have no clue why it died beyond these logs. Restarting it won’t work, it’d just always die.

What does this mean?

2024-11-27T05:41:01.563 app[] sin [info] INFO Starting init (commit: 3dd5d9e)...
2024-11-27T05:41:01.564 app[] sin [info] [ 0.567490] EXT4-fs (vda): VFS: Found ext4 filesystem with invalid superblock checksum. Run e2fsck?
2024-11-27T05:41:01.565 app[] sin [info] ERROR Error: couldn't mount /dev/vda onto /lower/dev/vda, because: EBADMSG: Not a data message
2024-11-27T05:41:01.566 app[] sin [info] [ 0.569576] reboot: Restarting system
2024-11-27T05:41:01.624 app[] sin [warn] Virtual machine exited abruptly

 2024-11-27T05:47:44.400 app[] sin [info] INFO Starting init (commit: 3dd5d9e)...
2024-11-27T05:47:44.415 app[] sin [info] INFO Mounting /dev/vdc at /data w/ uid: 0, gid: 0 and chmod 0755
2024-11-27T05:47:44.418 app[] sin [info] INFO Resized /data to 1073741824 bytes
2024-11-27T05:47:44.419 app[] sin [info] INFO Preparing to run: `docker-entrypoint.sh start` as root
2024-11-27T05:47:44.421 app[] sin [info] ERROR Error: failed to spawn command: docker-entrypoint.sh start: No such file or directory (os error 2)
2024-11-27T05:47:44.422 app[] sin [info] does `docker-entrypoint.sh` exist and is it executable?
2024-11-27T05:47:44.422 app[] sin [info] [ 0.592613] reboot: Restarting system
2024-11-27T05:47:44.498 app[] sin [warn] Virtual machine exited abruptly

When I go further back, it says the cluster or membership is incorrect??? No database values have been touched at all.

Focus on this machine
Health check for your postgres database has failed. Your database is malfunctioning.

Nov 27, 2024 04:15 UTC

Health check for your postgres role has failed. Your cluster's membership is inconsistent.

Nov 27, 2024 04:15 UTC
Health check for your postgres vm has failed. Your instance has hit 
resource limits. Upgrading your instance / volume size or reducing your 
usage might help.

mayailurus · November 27, 2024, 3:31pm

Hi… This looks like low-level disk corruption, unfortunately. (Yikes!)

Is this a single-node instance?

The Fly.io platform automatically takes daily disk snapshots, but they’re only retained for a few days, by default.

testtester · November 28, 2024, 1:58am

Yeah single node instance. Hmmm, I’ll try restoring from the disk snapshot.

Hmm, low level disk corruption, oh boy.

budimanjojo · November 30, 2024, 9:47am

I just had the same problem and my database is also located in Singapore. Followed the restore guide @mayailurus linked above to restore my volume (the latest one was 14 hours ago in my case) but failed with errors (something like “repmr database doesn’t exist” and 'collation version mismatch"). I managed to fix the restore problem by using the previous image version from my current image version (v15.2 instead of v15.3). Which makes me think that my problem is because Fly tried to upgrade the version of postgres to a broken image, because it was v15.2 after the snapshot being taken (less than 14 hours timeframe) before my problem happened.

system · December 7, 2024, 9:48am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Unable to mount volume and start machine in Singapore Questions / Help machines , volumes	2	39	December 2, 2024
EXT4-fs (vda): unable to read superblock machines	6	39	March 27, 2025
Postgres spontaneously died and cannot be started or restarted	6	663	December 6, 2022
postgres dead postgres	2	371	January 14, 2023
Postgres machine error: EXT4-fs (vda): unable to read superblock Questions / Help postgres , volumes	14	139	February 5, 2025

Fly Singapore Database randomly died

Related topics