Django Celery service healthchecks

I’m wanting to add healthchecks for my Celery services but it’s not clear how to do this from the docs.

I’ve got the machine check working for deployment. It’d be great if this was also called for healthchecks but it isn’t:

```
[[services.machine_checks]]
command = [“./manage.py check”]
entrypoint = [“/bin/sh”, “-c”]
kill_signal = “SIGKILL”
kill_timeout = “5s”
```

I’m not sure how to get a runtime check as Celery doesn’t expose a TCP port to probe for liveness nor a webserver.

The docs suggest the only healthchecks that can be done on a service are `tcp_checks` or http_checks. This feels odd in that surely a majority of “services” are process that don’t expose either?

Is there a Flyonic way to solve this?

Hi -

Actually the vast majority of services tend to be HTTP servers :slight_smile:

Health checks are mainly used by the proxy to see if your application came up properly and is responding to requests, so the proxy knows whether to send or hold requests to it.

For a service which doesn’t expose a port or handle requests like Celery, it wouldn’t make a lot of sense to have this kind of health check - Celery just starts up, begins fetching jobs from the message broker and running them, and that’s mostly it. There are two things you could monitor about Celery:

  1. Did it start and is it still running? You can do this with some sort of wrapper or supervisor on your machine. Note that this does not handle the situation of a celery worker process dying, but that should in general be handled by Celery itself.
  2. Is it still pulling and processing jobs from the queue? You can monitor this several ways.
    a. Look at queue size, is it always growing? (since Celery doesn’t expose this directly you’d need to run Flower as well).
    b. Is it failing to execute jobs? (this is generally because the task blows up, and is best monitored using something like Sentry)

You can also monitor resource utilization on your Celery machines and alert on that, a way to do it is with a small telemetry stack: GitHub - superfly/fly-telemetry: Build an observability stack out of Fly logs+metrics streams

let me know if that covers options to monitor a non-service process like Celery.