Can't bring app back from the dead

luke · January 27, 2021, 4:20pm

Good morning! I went to revive my app (sm-sandbox) to continue my explorations of your service, but I can’t seem to revive it. It shows up as “dead” in flyctl apps list and the web dashboard. flyctl deploy works but ends with “Monitoring Deployment”, and nothing changes with the app and nothing shows up in flyctl logs. I tried flyctl restart (says it succeeded) and flyctl resume (says it’s not suspended). It does show a failed release (v23) in the web dashboard and flyctl history. I imagine I could flyctl destroy and start over and that it’d work, but I’d have more confidence in the service if I could know what’s actually going on.

In the meantime I’m creating a new app under a different name so I can keep playing around.

kurt · January 27, 2021, 4:30pm

Hey again! It looks like the instances are starting, then failing health checks when you resume.

You can see this from flyctl:

flyctl status --all # to show the failed instances
flyctl status instance <id> # to see specifics of restarts

If you add restart_limit = 0 to your check definitions then deploy, it should work just fine.

luke · January 27, 2021, 5:25pm

Yup, looks like disabling health checks does the trick. But I don’t know if that’s a solution that’ll work in the long run…

My main concern is zero-downtime deployments. I don’t see how I can do this. Before, the health check was an integral part of it because the load balancer would rely in it to say, OK, this instance is ready to receive traffic. But maybe there’s another way about it. I’m also more open to the idea of having an NGINX or HAProxy layer if it comes to that.

Also, being able to have health checks in general would be nice; a “wait n seconds before the first check” option should do the trick, I think…

kurt · January 27, 2021, 5:29pm

That option would be handy, yeah. It should be doable!

That restart_limit doesn’t disable health checks, it only controls how many check failures it takes to trigger a restart (or mark an instance “bad” during deploy). Zero downtime deployments will still work fine. It won’t mark an instance healthy until the check starts passing.

luke · January 27, 2021, 8:14pm

Ah, I see. Thanks for clarifying. Looks like I can do deployments with zero downtime indeed!

Would still be great to have that delay option. Even though my app server is very stable, I get nervous about it not being restarted upon failure. I guess a workaround in the meantime would be to set a high enough timeout and/or restart_limit. Would make deploys a little slower and restarts a little less responsive, but worth it I think. Testing now with a 28s timeout (my app boots in ~20s) and a restart_limit of 2 and it seems to deploy and roll over reliably.

(Had to wait a few hours to post this reply due to new-user forum restrictions.)

Topic		Replies	Views
App status appears as "Dead" after restarting	7	1065	July 19, 2022
App is dead, restart does nothing, no logs	8	1216	May 30, 2022
503 error when resuming an app	13	2248	June 23, 2023
HTTP Health checks failing, but not restarting app	5	1030	July 25, 2023
App seems to freeze and / or get connections to dead instances sticking around	1	317	July 31, 2020

Can't bring app back from the dead

Related topics