I just received an alert from my monitoring stack that my app is down in LHR region. There don’t appear to be any logs showing any sort of errors, the instance has just “gone”. Trying to restart the app or deploy a new version doesn’t seem to fix anything and the status of the deployment just remains in the “desired” state. Any help would be much appreciated!
App
Name = reddilert
Owner = personal
Version = 131
Status = pending
Hostname = reddilert.fly.dev
Platform = nomad
Deployment Status
ID = fbcb8922-34aa-e5e0-b62a-01f3929e3910
Version = v131
Status = running
Description = Deployment is running
Instances = 1 desired, 0 placed, 0 healthy, 0 unhealthy
Instances
ID PROCESS VERSION REGION DESIRED STATUS HEALTH CHECKS RESTARTS CREATED
One of our servers in LHR had an issue that took a while to resolve. It was unfortunately the host where your app and volume reside. It’s back up and running again now and your app should be working again.
Sorry for the problems! If you can deploy your app in an HA manner with multiple instances they should all get placed on different hardware which would alleviate single host failures like this in the future (our servers don’t go down often, but it happens!)
This affected me too as my monitor shows that my site in LHR was down for about 20 mins. I’m not too bothered as my site isn’t live.
However, @steveberryman how come nothing shows on fly’s status page for this? Is there a threshold before it would be recorded as an “official” outage?
If I keep a scale count of 1 for my application but added an extra region, would my app have restarted automatically on an instance in the other region during the issue that LHR was having?