Usually, when there’s an issue with fly.io like the latest incident (Fly.io Status - Network issues impacting all services), that brings my app down, the app doesn’t recover after the issue has been resolved.
This last one was a networking issue and looking at
fly logs, the app’s running normally. However, there’s this error in the logs: "error.message=“could not find a good candidate within 90 attempts at load balancing. last error: no known healthy instances found for route tcp/443. (hint: is your app shut down? is there an ongoing deployment with a volume or are you using the ‘immediate’ strategy? have your app’s instances all reached their hard limit?)”.
The answer to all the hint questions is “no” - the app’s running normally.
So I’d like to understand two things: 1. why does my app remain offline after an issue like this and 2. is it possible to setup some kind of automatic recovery, and if not, is the solution to just keep an eye on the issue resolution and manually redeploy the app after an issue has been resolved?
Also, is the load balancing error in the logs related to having my app disconnected from Consul? Is that why it doesn’t come back online?