App has been down since yesterday, can't get it running anymore

camiel · September 10, 2024, 1:03pm

Since yesteday my app is down. I tried to follow the steps outlined in the case below but that didn’t get me further:

My error logs:
2024-09-10T12:58:44.554 proxy[784e900ce16d98] ams [error] [PM01] machines API returned an error: “rate limit exceeded”

2024-09-10T12:58:45.029 proxy[784e900ce16d98] ams [info] Starting machine

2024-09-10T12:58:45.031 proxy[784e900ce16d98] ams [error] [PM01] machines API returned an error: “machine still attempting to start”

2024-09-10T12:58:46.039 proxy[784e900ce16d98] ams [info] Starting machine

2024-09-10T12:58:46.041 proxy[784e900ce16d98] ams [error] [PM01] machines API returned an error: “machine still attempting to start”

2024-09-10T12:58:46.958 proxy[784e900ce16d98] ams [info] Starting machine

2024-09-10T12:58:47.071 app[784e900ce16d98] ams [info] 2024-09-10T12:58:47.071608740 [01J7DZ4KNW54FJVTAW5XRFTBCD:main] Running Firecracker v1.7.0

2024-09-10T12:58:47.400 app[784e900ce16d98] ams [info] [ 0.267387] PCI: Fatal: No config space access function found

2024-09-10T12:58:47.731 app[784e900ce16d98] ams [info] INFO Starting init (commit: 20f21dc5f)…

2024-09-10T12:58:47.750 app[784e900ce16d98] ams [info] INFO Mounting /dev/vdd at /data w/ uid: 0, gid: 0 and chmod 0755

2024-09-10T12:58:47.752 app[784e900ce16d98] ams [info] [ 0.615605] EXT4-fs (vdd): VFS: Found ext4 filesystem with invalid superblock checksum. Run e2fsck?

2024-09-10T12:58:47.752 app[784e900ce16d98] ams [info] ERROR Error: couldn’t mount /dev/vdd onto /data, because: EBADMSG: Not a data message

Please help!

mayailurus · September 10, 2024, 2:25pm

Hi… Quick suggestion… Try restoring from older snapshots, before they age off (i.e., auto-delete).

Also, a person could use fly m list and fly vol list to double-check that the machine reporting invalid superblock really is using a new volume—and not the older one that has filesystem corruption.

Hope this helps a little!

mayailurus · September 10, 2024, 2:26pm

Added volumes

camiel · September 10, 2024, 2:51pm

I appreciate the help, thanks!

I’ve create a volume off of a snapshot from the day before the errors started to occur. Then scale to 2 machines, the newly created volume attaches to the newly created machine, but the only logs I see afterwards are that the image is pulled & prepared.

When restarting the newly created machine (because nothing happens), I get:
Error: failed to restart machine 148e2453f72798: could not stop machine 148e2453f72798: failed to restart VM 148e2453f72798: failed_precondition: unable to restart machine, not currently started or stopped (Request ID: 01J7E6J90MP3FNFEQZSCF1FJ14-fra)

Do you have any clue?

Thanks!

mayailurus · September 10, 2024, 3:10pm

Hm… There was an incident yesterday with the Machines API, and there might still be some localized holdovers.

Maybe try creating the volume in cdg or arn and see if you have better luck there?

camiel · September 10, 2024, 3:23pm

Whilst having a look just now it seems that your previous solution was working all of a sudden

Thank you very much!

system · September 12, 2024, 3:23pm

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
LHR volume unmountable Questions / Help lhr , volumes	2	125	April 24, 2024
Can't deploy to LHR? Questions / Help lhr	7	324	March 29, 2023
Anyone else having their apps go down?	24	362	September 20, 2024
insufficient resources Questions / Help lhr , machines , volumes	2	115	January 17, 2025
Issues in LHR region?	4	495	January 25, 2023

App has been down since yesterday, can't get it running anymore

Related topics