For the past few months we’ve been getting intermittent downtime.
We have 3 or so services sitting behind an nginx proxy on Fly. Every now and then, the all become unavailable at the same time. This leads me to believe it has to be something to do with the proxy on Fly.
I don’t know enough about what the metrics mean, but is there anything obvious in these stats that looks like it could point to an issue?
How often does this happen, roughly? (or exactly, if you know)
We might just have had a short amount of downtime a few minutes ago. However, it should’ve been stable for the past 1 month (at least).
I think thats the 2nd time this month. Last time was 5th of March and before that was 23rd Feb.
Can you tell me more about what kind of downtime you’re seeing? Are you running an automated test that’s showing downtime, or seeing something else?
One way to detect whether an issue is on our end or your end is to also monitor debug.fly.dev. If it’s just one app throwing errors, it’s probably not our infrastructure causing it.
Also, how many VMs are you running? Does
fly status --all show any failed VMs? Occasionally errors like this are caused by a single VM crashing and restarting.