Failing Health Checks after Deploy

I’m able to successfully build and deploy my gql app from a dockerfile. In the logs I see my server success starts and logs the correct port. But I’m seeing [error] Health check status changed ‘warning’ => ‘critical’ in the logs.

It’s hard to see why it’d be failing just from the health check logs. I can see the server starts up properly and logs the port. I’ve made sure to add the port as an env to the fly.toml and I’ve exposed the port in my dockerfile, but I’m just getting failed health checks with this as the only error, “Failed due to unhealthy allocations - no stable job version to auto revert to and deploying as v1”

What kind of runtime is this? The most common problem is an app that’s only listening on 127.0.0.1, we need to access it through a different IP so the simplest is to listen on 0.0.0.0.

This sometimes happens because the health checks don’t pass quickly enough. If you’re listening on the right IP, try changing the grace period: Elixir app not starting up - #9 by kurt

I just removed the health check to see if I could get any more data, and all I can see from the logs is that everything starts fine, then just shutsdown with no further information. The last three logs look like this:

2022-01-22T00:22:42.522 app[3b331a8f] vin [info]{“level”:30,“time”:1642810962522,“pid”:544,“hostname”:"***",“msg”:“Server listening at http://0.0.0.0:4000”}

2022-01-22T00:22:59.867 runner[c7ca037a] iad [info]Shutting down virtual machine

2022-01-22T00:22:59.999 app[c7ca037a] iad [info]Sending signal SIGINT to main child process w/ PID 511

We actually install a default health check when you remove them, so removing it probably didn’t do what you were expecting.

If you run fly vm status c7ca037a you can see a bit more detail about why it stopped. It seems like the health check might be the problem.

Try adding this to the health check block in the config. This will give the VM more time to come up before it’s deemed unhealthy:

grace_period = "10s"