App has been down since yesterday, can't get it running anymore

Since yesteday my app is down. I tried to follow the steps outlined in the case below but that didn’t get me further:

My error logs:
2024-09-10T12:58:44.554 proxy[784e900ce16d98] ams [error] [PM01] machines API returned an error: “rate limit exceeded”

2024-09-10T12:58:45.029 proxy[784e900ce16d98] ams [info] Starting machine

2024-09-10T12:58:45.031 proxy[784e900ce16d98] ams [error] [PM01] machines API returned an error: “machine still attempting to start”

2024-09-10T12:58:46.039 proxy[784e900ce16d98] ams [info] Starting machine

2024-09-10T12:58:46.041 proxy[784e900ce16d98] ams [error] [PM01] machines API returned an error: “machine still attempting to start”

2024-09-10T12:58:46.958 proxy[784e900ce16d98] ams [info] Starting machine

2024-09-10T12:58:47.071 app[784e900ce16d98] ams [info] 2024-09-10T12:58:47.071608740 [01J7DZ4KNW54FJVTAW5XRFTBCD:main] Running Firecracker v1.7.0

2024-09-10T12:58:47.400 app[784e900ce16d98] ams [info] [ 0.267387] PCI: Fatal: No config space access function found

2024-09-10T12:58:47.731 app[784e900ce16d98] ams [info] INFO Starting init (commit: 20f21dc5f)…

2024-09-10T12:58:47.750 app[784e900ce16d98] ams [info] INFO Mounting /dev/vdd at /data w/ uid: 0, gid: 0 and chmod 0755

2024-09-10T12:58:47.752 app[784e900ce16d98] ams [info] [ 0.615605] EXT4-fs (vdd): VFS: Found ext4 filesystem with invalid superblock checksum. Run e2fsck?

2024-09-10T12:58:47.752 app[784e900ce16d98] ams [info] ERROR Error: couldn’t mount /dev/vdd onto /data, because: EBADMSG: Not a data message

Please help!

Hi… Quick suggestion… Try restoring from older snapshots, before they age off (i.e., auto-delete).

Also, a person could use fly m list and fly vol list to double-check that the machine reporting invalid superblock really is using a new volume—and not the older one that has filesystem corruption.

Hope this helps a little!

Added volumes

I appreciate the help, thanks!

I’ve create a volume off of a snapshot from the day before the errors started to occur. Then scale to 2 machines, the newly created volume attaches to the newly created machine, but the only logs I see afterwards are that the image is pulled & prepared.

When restarting the newly created machine (because nothing happens), I get:
Error: failed to restart machine 148e2453f72798: could not stop machine 148e2453f72798: failed to restart VM 148e2453f72798: failed_precondition: unable to restart machine, not currently started or stopped (Request ID: 01J7E6J90MP3FNFEQZSCF1FJ14-fra)

Do you have any clue?

Thanks!

Hm… There was an incident yesterday with the Machines API, and there might still be some localized holdovers.

Maybe try creating the volume in cdg or arn and see if you have better luck there?

Whilst having a look just now it seems that your previous solution was working all of a sudden

Thank you very much!

1 Like

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.