flask + celery worker autostart/stop

I’m working on a flask + celery application and trying to troubleshoot problems with the Celery worker. Everything checks out initially when I launch the app with the following fly.toml:

app = "myapp"
primary_region = "sjc"

[processes]
  web = "gunicorn -b 0.0.0.0:8080 --worker-class eventlet -w 6  manage:app"
  worker = "celery --app tasks.async_celery_tasks.celery worker -c 4 --loglevel=info"

[http_service]
  processes = ["web" ]
  internal_port = 8080
  force_https = true
  auto_stop_machines = true
  auto_start_machines = true

[checks]
  [checks.alive]
    type = "tcp"
    interval = "15s"
    timeout = "2s"
    grace_period = "5s"
    processes = ["web"]

After some time, fly will automatically stop the worker. Then when new requests come in, I’m unable to automatically start the worker again and my tasks never get kicked off.

Two questions:

  1. What is the best way to start / stop celery workers running as their own process (similar to the way fly manages this for a web server)? I see this option for automatically starting python workers independently of Celery, but it was unclear to me how you would do that at scale for multiple workers and also balance compute. I’d also like to be able to use Celery rather than roll my own task worker.

  2. Is there anyway to set up health checks for celery workers as well? Or do I need to bake the celery worker with its own web server to satisfy the health checks? That feels a bit redundant, but I guess it might let me scale the workers independently of the main web server?

This is unfortunately non-trivial – I don’t think you can use the auto start/stop (the way we do for web servers), since that relies on http requests (at least, I can’t think of a way to do it, perhaps someone cleverer than myself will figure something out :grin:). The first thing I would try is custom autoscaler behavior. I think that you could probably override scale_up and scale_down to use the Machines API to add more workers/remove workers. If you try this, please update and let us know how it goes – I am curious!

Right now, I don’t think there’s good a way of doing this other than baking in a web server, because the health checks are just http requests. You could also consider simply removing the checks (the worker won’t start processing jobs until it’s ready, anyway), though obviously then you lose some visibility into stuff going wrong when you deploy.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.