Dowtime for more that 15 minutes already

Getting a lot of downtime recently. Fly status shows app is up and operational, but it’s not loading.

I had to scale to another region. Adding a second instance in Amsterdam helped, but the one in Frankfurt is not serving HTTP requests (even after few restarts).

Hey if you check fly logs are there any further hints? There might be some more clues as to what’s going on there.

They stream so if you have a reproduction it might be useful to open fly logs then make some requests.

Seems like requests just don’t hit that instance. Logs show instance running but don’t receive any requests. It’s not opening by IP address, either (I thought it may be DNS first). I tried scaling back down to that one instance in FRA, but it doesn’t work. Instance in AMS now receiving all requests.

Could you post some of the output from fly status?

What kinds of concurrency settings do you have for your fly.toml?

here’s some docs on that that might help: App Configuration (fly.toml) · Fly Docs

If you’re using something like Phoenix LiveView which operates over websocket connections the default fly.toml hard-limit might be getting hit and causing us to drop incoming connections.

If that’s all the case, you could consider upping your limit to a couple hundred or something.

As you might imagine, this means more traffic will get to each instance so its worth keeping an eye on your app’s metrics to decide if its worth upping or lowering those limits as needed.


Thanks, I’ll try that. I had defaults (20-25) set. I will try increasing it right now.

1 Like