I’m new to Fly and exploring how to properly configure autoscaling for a region-based web‑queue‑worker architecture.
Here’s my goal:
I have a web service and a worker service.
We are using redis queue and we have different queue, named like queue_name_{region} (for example, queue_name_bom, queue_name_sfo, etc.) which is supposed to be consumed in that region’s worker service.
The worker in each region should only process jobs from its region’s queue.
I’d like the autoscaler to scale workers per region, based on that region’s queue depth or workload.
What would be the best approach to achieve this in Fly?
Should I run one app per region (e.g., myapp-worker-bom, myapp-worker-sfo) and scale them independently?
Or can I manage multiple region workers under one app process group and still have Fly autoscale each region separately?
Any guidance, architecture examples, or best practices would be super helpful.
I believe your first approach would work well — an app per region. You could then use the Fly Autoscaler with the Scale multiple applications configuration, so each would be scaled according to its own metrics.
I have a centralised queue, and create Fly machines per job, each of which exits once the job is finished. Currently, created machines are in the default region(s) for the worker app, but the API permits a region to be specified.
This approach seems simple. Do we have any limit on the number of machines we can spin up at a time. My usecase may involve running 1000s of jobs at a time.
Last I heard, the default limit was twenty-ish, but you can email Support, ahead of time, to get that lifted. I actually see an explicit bound displayed on the dashboard lately:
For thousands of Machines, I’d also suggest consulting the (excellent) new capacity API. Even the largest regions have only ~2000 free.
(This is when the new region-fallback feature comes in handy, I’m guessing.)
Finally, this is just my personal opinion, but at that scale, you’ll probably want a formal Fly Support plan. Most of those include real-time architecture advice sessions, which typically repay the (relatively small) extra outlay on their own—in terms of cost-savings, etc., .
You can of course do a mix. For example, you have a distributor task to give out jobs, which spins up a machine, and it claims and sends five waiting jobs in one go, and the newly-created machine does five serially. If you keep doing that, you’ll have a mix of parallel and serial operations, allowing you to balance speed with cost.
(This may also help to keep a lid on an excessive number of machines, which can in their own right become hard to manage.)