HTTP Health checks failing, but not restarting app

savikko · March 5, 2022, 9:15am

Hi!

I am not sure if I have understood HTTP health check correctly, but i have this kind of configuration:

  [[services.http_checks]]
    interval = 10000
    method = "get"
    path = "/healthcheck"
    protocol = "http"
    timeout = 5000
    tls_skip_verify = false

I have no TCP checks defined.

Instance status/health checks is currently: running/1 total, 1 critical. I assumed Fly would restart my app if all(?) health checks fail, but it looks like restart does not happen.

flyctl checks list:

NAME                             STATUS   ALLOCATION REGION TYPE LAST UPDATED OUTPUT
5c800e7c9d8a343831f802ff4147a8ff critical c5fe99a5   lhr    HTTP 6m6s ago     HTTP GET
                                                                              http://172.19.2.2:3000/healthcheck:
                                                                              503 Service Unavailable Output:
                                                                              {"error":"internal error"}
5c800e7c9d8a343831f802ff4147a8ff critical 97162895   fra    HTTP 13s ago      HTTP GET
                                                                              http://172.19.1.130:3000/healthcheck:
                                                                              503 Service Unavailable Output:
                                                                              {"error":"internal error"}

So, on both instances health check is failing.

What I would want would be that Fly would restart instance on this situation - is that possible somehow?

savikko · March 5, 2022, 3:34pm

I might have understood now this:

When instance has launched successfully and then http check fails, it will restart
If instance starts and check will not pass, it keeps on running, no restarts

As our app needs restart on specific situations, I have now implemented restart inside app as a workaround.

kurt · March 5, 2022, 3:37pm

Try adding a restart_limit = 6 to your check. This will make the service restart after 6 consecutive failures.

You didn’t really misunderstand:

On deploy, checks have to pass to allow the deploy to continue. If a check fails, we restart a couple of times to make sure the error wasn’t transient
The restart_limit option controls restart after a VM is successfully deployed.

Restarting in the app is usually better, so your workaround might be worth keeping. The problem we had with restarting on checks is that when a backend resource fails, all the VMs might fail health checks at the same time.

savikko · March 5, 2022, 4:16pm

Oh thanks. As an extra request, could you add that to docs, currently seems to be missing from there.

Still, I need to think whether I will add that or keep this workaround. It seems to work now so better not to fix that

bra1ndump · June 18, 2023, 10:42pm

I am running into the same issue now, but my apps are V2 and they for some reason don’t support restart limit, making health checks not as useful.

I am running a container with an unstable headless chrome instance, and I dont want to write logic to restart chrome, I would much rather restart the whole vm.

Looking into ways to deploy v1 instead of v2 now

BenBar · July 25, 2023, 12:37pm

I have the same problem with applications running on V2. Currently, if the health check fails, the machine goes into a suspended state instead of resetting. Only when there is an incoming request it tries to restart. Is there any way to avoid such behavior and simply reset such a machine immediately?

Topic		Replies	Views
Instance or service not restarted when I expected it to Questions / Help	5	1150	July 26, 2022
Critical health check, but app not restarting? Questions / Help wishlist , appsv2	2	470	December 14, 2023
Can't bring app back from the dead	4	1151	January 27, 2021
Cannot get http_service.checks to work rails	6	265	March 3, 2024
Unable to perform health checks	9	514	May 13, 2024

HTTP Health checks failing, but not restarting app

Related topics