Can't bring app back from the dead

Good morning! I went to revive my app (sm-sandbox) to continue my explorations of your service, but I can’t seem to revive it. It shows up as “dead” in flyctl apps list and the web dashboard. flyctl deploy works but ends with “Monitoring Deployment”, and nothing changes with the app and nothing shows up in flyctl logs. I tried flyctl restart (says it succeeded) and flyctl resume (says it’s not suspended). It does show a failed release (v23) in the web dashboard and flyctl history. I imagine I could flyctl destroy and start over and that it’d work, but I’d have more confidence in the service if I could know what’s actually going on.

In the meantime I’m creating a new app under a different name so I can keep playing around.

Hey again! It looks like the instances are starting, then failing health checks when you resume.

You can see this from flyctl:

flyctl status --all # to show the failed instances
flyctl status instance <id> # to see specifics of restarts

If you add restart_limit = 0 to your check definitions then deploy, it should work just fine.

Yup, looks like disabling health checks does the trick. But I don’t know if that’s a solution that’ll work in the long run…

My main concern is zero-downtime deployments. I don’t see how I can do this. Before, the health check was an integral part of it because the load balancer would rely in it to say, OK, this instance is ready to receive traffic. But maybe there’s another way about it. I’m also more open to the idea of having an NGINX or HAProxy layer if it comes to that.

Also, being able to have health checks in general would be nice; a “wait n seconds before the first check” option should do the trick, I think…

That option would be handy, yeah. It should be doable!

That restart_limit doesn’t disable health checks, it only controls how many check failures it takes to trigger a restart (or mark an instance “bad” during deploy). Zero downtime deployments will still work fine. It won’t mark an instance healthy until the check starts passing.

Ah, I see. Thanks for clarifying. Looks like I can do deployments with zero downtime indeed!

Would still be great to have that delay option. Even though my app server is very stable, I get nervous about it not being restarted upon failure. I guess a workaround in the meantime would be to set a high enough timeout and/or restart_limit. Would make deploys a little slower and restarts a little less responsive, but worth it I think. Testing now with a 28s timeout (my app boots in ~20s) and a restart_limit of 2 and it seems to deploy and roll over reliably.

(Had to wait a few hours to post this reply due to new-user forum restrictions.)