Health check status changes now appear in logs

There isn’t currently much visibility into all these health checks our users have been setting for their services. The only hint you might get sometimes if when your app keeps restarting because it is failing health checks.

We’ve now added a log message for every health check status change. Each log should include clearly what the status is and what its impact might be. If your app isn’t already logging errors, at least these should tell you something is wrong.

Example logs:

Health check on port 8080 is now passing.

Health check on port 8080 is in a ‘warning’ state. Your app may not be responding properly. Services exposed on ports [80, 443] may have intermittent failures until the health check passes.

Health check on port 8080 has failed. Your app is not responding properly. Services exposed on ports [80, 443] will have intermittent failures until the health check passes.

Health check for your postgres vm has failed. Your instance has hit resource limits. Upgrading your instance / volume size or reducing your usage might help.

Each of these logs are attached to specific instances within your app. They also have an appropriate log level based on the health check’s status.

If you don’t know how to fix your app, here are a few common issues causing them:

  • Your app is not listening on 0.0.0.0 (“any IPv4”) or :: (any IP). Our health checker runs from outside your instances and need to reach in. Listening on 127.0.0.1, for example, won’t cut it. This is a common cause, we’re working on automatically forwarding that, but it’s not ready yet.
  • Something is blocking your accept loop. This would prevent the health check from connecting.
  • You’re using an HTTP check and the response is not a 200 OK.
  • Your instance’s resources are reaching their limits. This could slow everything down, including accepting connections and responding to HTTP requests. Slow responses can trigger health check failures.
  • Your app is not catching all thrown errors. If your app panics before it can respond to an HTTP request, it will look like a broken request to the health checker.

We know these aren’t perfect yet, we’ll keep improving them!

7 Likes