Getting "timed out while connecting to instance" errors for apps in Frankfurt region


I just had to restart my apps in the Frankfurt region as they were not reachable from the internet. Seemed to happen around 10:35 UTC.

As an example, I have an app with two instances. About 30 minutes after restarting the apps, one of the instances of reported critical again. Just now while writing this post, the other one also became unavailable.

Are there ongoing issues with the Frankfurt region? We had the same type of issue a couple of weeks ago, where we also had to restart the apps in order to get them working again. I couldn’t see any issues on the status page now, but something must be wrong.

Edit: since posting this, I am seeing timeouts in the logs again, and one of the nodes has gone critical

This usually means your app is not accepting connections fast enough / is blocking its accept loop in some way.

There’s a 2 seconds timeout when connecting to your app locally. This should be more than enough for any app. The connection time for apps is usually closer to 1-5ms.

Looking at your fly_app_connect_time_seconds_bucket metric, it looks like it’s taking more than 100ms (the higher end of our buckets) about half the attempts at connecting to it. The other half is within 5ms.

Got it, thank you for getting back to me on this.