App is down, monitoring says it's ok, how to troubleshoot?

:wave:

Not sure if it’s ok to post here about app downtime but it seems one of the apps I’ve deployed ruslan-now is down right now, http requests time out and fly ssh console times out as well. What is the usual way to troubleshoot it? Monitoring page says the app is running, last logs are from 2022-06-15T11:01:43.793, logs seem normal.

We just saw two hosts in AMS go offline, so that’s the region where your app is deployed we’re looking into it!

1 Like

Ah, thank you for a quick response!

Looks like the servers are back online- must have been a small/temporary network issue at the datacenter.

1 Like

Just to confirm, I noticed this as well at AMS. Load balancer was up and started the HTTP/2 response, but I guess a backend was offline as I couldn’t ssh in either. Resolved after about 10 minutes.

Yep! It was likely a switch failure. Two hosts in the same rack lost network connectivity for a few minutes. The edge was fine, and all the other hosts in AMS stayed up.

Fingers crossed you got your full quota of “Fly.io-caused-outages” filled over the last two days. It’s not normally like this, I promise. :slight_smile:

1 Like

Haha, hopefully. At the same time, I’ve spent a whole 12 cents so far, so I haven’t got much room to complain yet :wink: The communication is great though - it’s a lot better to know everything is in hand.

I’m trying to deploy to AMS and getting this error:

Adding layer 'heroku/nodejs-engine:dist'
ERROR: failed to export: caching layer (sha256:794677c885fbe7334c92f9cedd9ca78256657998776725af428ab730b7e0aa49): write /launch-cache/staging/sha256:794677c885fbe7334c92f9cedd9ca78256657998776725af428ab730b7e0aa49.tar: copy_file_range: no space left on device
Error failed to fetch an image or build from source: executing lifecycle: failed with status code: 62

Is this related to the same issue?