App seems to freeze and / or get connections to dead instances sticking around

I am not sure this is the case any longer, but in the past I have had apps just die out of no where and get stuck in this “dead” state.

The larger concern of mine was getting these apps back up and running and making sure all the dead VMs were cleared from the pool.

I would normally constantly run scale and restart commands with no changed made to the app status / info commands. I would typically be trying to get all dead nodes removed and new ones spun back up but would see no results by running different commands, they either seem like they are not working or they are very async and that is not an option when you servers are down, to me this should take place immediately, maybe even show some more process of this happening in the logs / status of the flyctl.

It is common for me to have Heroku open all day ( I know this might sound crazy ), but we are a small startup that is still on the “move fast and break things” philosophy - so popping in and refreshing the metrics is something pretty common for me from day to day.

It would be nice to have a better sense of the current apps state in the online dashboard for each app.

This happens for some apps when they have TCP health checks enabled, but they’re not responding properly to HTTP requests. HTTP check docs are here: https://fly.io/docs/configuration/#services-http_checks

If this happens again, and you have HTTP checks, check and see if the app status is dead when you run flyctl status. We should be retrying crashing apps perpetually now, but we used to give up after a few hours.

The scaling commands used to just submit a job that ran later, now they trigger updates immediately so if you make changes to that it should be pretty quick.

1 Like