I have suddenly started getting this error in my logs since the past 15 hours:
no known healthy instances found for route tcp/443. (hint: is your app shut down? is there an ongoing deployment with a volume or are you using the 'immediate' strategy? have your app's instances all reached their hard limit?)
I have checked the hard limits on grafana and it hasn’t been hit even once in the last 24 hours. The cpu utilisation and memory are well under their accepted values. And I am certainly not running any deployments.
What can I do to find the root cause of this issue and fix it?
This seems to have resolved the issue for me. I didn’t have any health checks defined in my fly.toml, so I added a basic TCP check, and that seems to have done the trick.
My guess is that the Fly proxy couldn’t determine any healthy instances without a health check configured.
However, I’ve been running without health checks for over a year without problems, so I’m not sure why the error started appearing today. Possibly due to some recent changes in how the Fly proxy handles health checks?