Are there any recommended practices to run workers that process jobs from a queue?
Say we have BullMQ running on Fly.io. We also have a bunch of machines that subscribe to a queue and consume jobs from it (1 machine = 1 concurrent job). If there are no jobs in the queue, no machines shold be running (scale to 0).
Now, how do you manage the starting of machines (and potentially the creation of new ones) based on jobs on the queue? Ideally without having to deal with the Machines API.
If I have a cheap 24/7 machine exposed to the internet taking requests and enqueueing the necessary jobs in BullMQ. Imagine a request comes in that requires 4 concurent jobs. Is there a way to programmatically say “hey, I need 4 worker machines to start”. So each worker machine would start and fetch one job each from the queue, process it and go back to sleep. Once again, without having to deal with Machines API directly and keeping track of each machine ID, state, etc.
I did some testing with fly-replay responses. So if a machine is busy processing a job, it will return a fly-replay: elsewhere. This worked fine with a few machines, but when I tried it at a bigger scale, things didn’t work out. Flycast would make a ton of requests and get a lot of fly-replay responses consecutively and ultimately stopped insisting.
AFAIK, there’s no way to ensure that a request to the private Flycast address is routed to a stopped machine (so that it’s started). And even if there was a way, this still leaves out the creation of new machines out of the question.
Is this possible somehow in Fly.io? Or am I getting the wrong idea here?
Have you tried setting soft_limit: 1, or even hard_limit: 1? I’m just guessing based on the documentation, but this should make Fly Proxy start more machines when all machines are already handling a request/connection.
Auto scaling in Fly Proxy seems to revolve around open connections/HTTP requests to the workers. Would it make sense for you to expose an HTTP API on the worker machines, and keep queue consumers on the 24/7 machine?
Another approach might be fly-autoscaler: Autoscale based on metrics · Fly Docs. You’d need to expose your queue length as a metric. It can’t go to zero machines existing, but it seems it might scale to zero machines started. It requires a machine to copy to scale up.
That isn’t recommended, as Fly and et al aren’t designed for 1:1 requests. What happens as you’ve speculated is that request will bounce w/ high frequency. The same reason why fly-replay doesn’t work… when there’s no vacancy, you end up DOSing yourself.
What OP wants to do is possible on Fly except the 1:1 requirement. You can’t have a worker setup and not use it at all… that’s what queueing is for.
As far as I know, you have to use the machine API to create new machines.
“You can’t have a worker setup and not use it at all”. Why do you say that? All I asked was a way to not have expensive workers running 24/7. I mean, with the amazing features of firecracker VMs that can be ready to perform work in a matter of seconds, it feels like some kind of “auto-stop workers” setup would be easy to implement on Fly.io and certainly something people would use. Not everybody has the volume of work so that keeping a worker 24/7 makes sense.
I could easily implement my own architecture if there was a simple way in Fly.io to say “hey, wake up 5 VMs from a specific service group”. There’s currently no way to do that unless you issue 5 different start commands using the Machines API, AFAIK.
Ultimately, job processing is a common pattern and some are quite compute intesive that require several minutes of a single VM processing just one job. Going back to sleep after it’s done only to be woken up when there’s more work to do makes a lot of sense.
That’s fine for background or scheduled jobs. I just can’t have an user wait for the next scheduled wake up to get their result. It has to be on-demand. But it’s also unpredictable and low volume to worth running 24/7.
I guess I just come from a managed and serverless background and I don’t have the resources and capacity to build a full Fly Machine orchestration system at the moment.
In order to do this effectively, you need to go from a pull mentality to a push mentality. The workers are not pulling from BullMQ, instead you create a scheduler that checks BullMQ every second/minute/hour, whatever works. The scheduler can run along with BullMQ.
If a job is found in the queue, the scheduler pulls it from the queue and then PUSHES it out to the worker machines.
The worker machines should have an HTTP interface for accepting requests. They should be able to accept more than 1 request at a time. Setup your connection limits and Fly will spin up your workers when necessary and distribute the work over HTTP. Then, when there is no work left to do, the worker machine will spin back down to 0.
You can use metrics-based autoscaling to spin workers up or down based on queue size. Metrics-based Autoscaling
It uses the Machines API under the hood but you don’t have to think too much about that.
On Python land I’d run Celery Flower next to my Celery workers to publish queue size metrics for the auto-scaler to base its decisions on. I’m not sure how to publish queue size metrics with BullMQ, but pretty sure there’s a way to do so
Like @roadmr was saying, you can have one very cheap machine in your app publishing metrics (like queue_length), and a cheap Fly Autoscaler deployment configured to listen and respond with immediately starting workers when queue_length > 0.
Either that, or you configure your worker app to take jobs by pushing them, so the fly proxy can wake them up, and they shut themselves down when they’re done processing jobs.
I want to start with something simpler without involving autoscaler, metrics and such so I’ll give using Machines API directly a try.
When using fly scale I can add a bunch of new machines of a specific service group very quickly and easily without specifying configuration. Is this not possible with the Machines API? I mean, the configuration is already stored within the app, right? But this doesn’t seem to be possible with the API. I need to specify a full config object for every machine creation request?
I think I’ve read somewhere that “fly scale count” will clone machines. Is that not possible to do with the API? I could not find a clone endpoint here: Fly Machines API
Basically, I want to simply do the following:
Have a cheap public web service taking requests 24/7
For each request, calculate the number of machines required to perform the work (my own logic).
Create the needed machines, wait until they are “started”. I could tolerate a cold start of 10s for a machine to be created and ready (I understand they are created within a few seconds?).
As soon as a machine is started, use their internal IP address to send a request to perform the job.
As soon as each machine is done with its work. It should be destroyed. I’m guessing each machine can call the Machines API with their own ID to request destroyal. If not, I can do it from the cheap webservice.
If all this makes sense, how can I do this with the Machines API? Right now, I’m struggling with the config object required for creating new machines. config.image required, where’s the Docker image of an app stored? Where can I see that? It’s not in the configuration tab.
Why are we required to provide a full config again if the /machines endpoints are below a Fly App, so they belong to the app and should know about its image and configuration.
Oh, and I’ve just read that the Machines API is rate limited per action… So If I need to create 8 new machines, I can’t just make 8 parallel requests? I need to do one at a time waiting 1 second between each? Uhm…
A first try to creating new machines through the API results in machines taking about 20-25 seconds to be created and started. It looks like pulling the image from the registry takes 20s. Is this expected? I’m pretty sure I read somewhere that creating a machine should not take more than 5 seconds and starting a stopped machine should take under a second.
Pulling from the registry can be pretty slow, if you have small images then its not really noticeable but if you have very large images you will notice it. However, the host servers cache images so if you are using the same image it should be much faster after the first machine on that host.