Fly.io issues keep my app down after resolved

Usually, when there’s an issue with fly.io like the latest incident (Fly.io Status - Network issues impacting all services), that brings my app down, the app doesn’t recover after the issue has been resolved.

This last one was a networking issue and looking at fly logs, the app’s running normally. However, there’s this error in the logs: "error.message=“could not find a good candidate within 90 attempts at load balancing. last error: no known healthy instances found for route tcp/443. (hint: is your app shut down? is there an ongoing deployment with a volume or are you using the ‘immediate’ strategy? have your app’s instances all reached their hard limit?)”.

The answer to all the hint questions is “no” - the app’s running normally.

So I’d like to understand two things: 1. why does my app remain offline after an issue like this and 2. is it possible to setup some kind of automatic recovery, and if not, is the solution to just keep an eye on the issue resolution and manually redeploy the app after an issue has been resolved?

Also, is the load balancing error in the logs related to having my app disconnected from Consul? Is that why it doesn’t come back online?

1 Like

Also facing this issue! App is basically not usable.

Many requests are not served properly!

Luckily only affecting my staging… though I am hesitant to update my production, given those errors.

example logs:

2023-11-14T11:15:09.609 proxy[7811e42f992928] fra [error] could not find a good candidate within 90 attempts at load balancing

2023-11-14T11:15:09.971 proxy[7811e42f992928] fra [error] could not find a good candidate within 90 attempts at load balancing

facing the same issue i wonder why

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.