Queue/Worker architecture with Autostop/autostart Machines?

empz · October 6, 2024, 10:32am

Are there any recommended practices to run workers that process jobs from a queue?

Say we have BullMQ running on Fly.io. We also have a bunch of machines that subscribe to a queue and consume jobs from it (1 machine = 1 concurrent job). If there are no jobs in the queue, no machines shold be running (scale to 0).

Now, how do you manage the starting of machines (and potentially the creation of new ones) based on jobs on the queue? Ideally without having to deal with the Machines API.

If I have a cheap 24/7 machine exposed to the internet taking requests and enqueueing the necessary jobs in BullMQ. Imagine a request comes in that requires 4 concurent jobs. Is there a way to programmatically say “hey, I need 4 worker machines to start”. So each worker machine would start and fetch one job each from the queue, process it and go back to sleep. Once again, without having to deal with Machines API directly and keeping track of each machine ID, state, etc.

I did some testing with fly-replay responses. So if a machine is busy processing a job, it will return a fly-replay: elsewhere. This worked fine with a few machines, but when I tried it at a bigger scale, things didn’t work out. Flycast would make a ton of requests and get a lot of fly-replay responses consecutively and ultimately stopped insisting.
AFAIK, there’s no way to ensure that a request to the private Flycast address is routed to a stopped machine (so that it’s started). And even if there was a way, this still leaves out the creation of new machines out of the question.

Is this possible somehow in Fly.io? Or am I getting the wrong idea here?

ktosiek · October 6, 2024, 2:17pm

Have you tried setting soft_limit: 1, or even hard_limit: 1? I’m just guessing based on the documentation, but this should make Fly Proxy start more machines when all machines are already handling a request/connection.

Auto scaling in Fly Proxy seems to revolve around open connections/HTTP requests to the workers. Would it make sense for you to expose an HTTP API on the worker machines, and keep queue consumers on the 24/7 machine?

Another approach might be fly-autoscaler: Autoscale based on metrics · Fly Docs. You’d need to expose your queue length as a metric. It can’t go to zero machines existing, but it seems it might scale to zero machines started. It requires a machine to copy to scale up.

khuezy · October 6, 2024, 2:31pm

That isn’t recommended, as Fly and et al aren’t designed for 1:1 requests. What happens as you’ve speculated is that request will bounce w/ high frequency. The same reason why fly-replay doesn’t work… when there’s no vacancy, you end up DOSing yourself.

What OP wants to do is possible on Fly except the 1:1 requirement. You can’t have a worker setup and not use it at all… that’s what queueing is for.

As far as I know, you have to use the machine API to create new machines.

empz · October 7, 2024, 10:32am

“You can’t have a worker setup and not use it at all”. Why do you say that? All I asked was a way to not have expensive workers running 24/7. I mean, with the amazing features of firecracker VMs that can be ready to perform work in a matter of seconds, it feels like some kind of “auto-stop workers” setup would be easy to implement on Fly.io and certainly something people would use. Not everybody has the volume of work so that keeping a worker 24/7 makes sense.

I could easily implement my own architecture if there was a simple way in Fly.io to say “hey, wake up 5 VMs from a specific service group”. There’s currently no way to do that unless you issue 5 different start commands using the Machines API, AFAIK.

Ultimately, job processing is a common pattern and some are quite compute intesive that require several minutes of a single VM processing just one job. Going back to sleep after it’s done only to be woken up when there’s more work to do makes a lot of sense.

khuezy · October 7, 2024, 1:34pm

Here’s my worker waking up every hour and working for 5 minutes on a schedule. It also wakes up on demand and shuts off after work is done.

The tools are there but you seem too hesitant to use them, eg Machine APIs

Why is that stopping you?

empz · October 7, 2024, 2:31pm

That’s fine for background or scheduled jobs. I just can’t have an user wait for the next scheduled wake up to get their result. It has to be on-demand. But it’s also unpredictable and low volume to worth running 24/7.

I guess I just come from a managed and serverless background and I don’t have the resources and capacity to build a full Fly Machine orchestration system at the moment.

khuezy · October 7, 2024, 3:28pm

I don’t think you understand. It works for background/scheduled jobs as well as on demand.

empz · October 7, 2024, 3:54pm

No, I don’t understand, that’s why I’m asking here. The screenshot of your VM memory usage is not making me understand. Thanks anyway.

khuezy · October 7, 2024, 4:07pm

I’ve replied to a few of your posts trying to steer you in the right direction but you’ve ignored them all so I’ll stop. Good luck.

agad · October 7, 2024, 6:56pm

In order to do this effectively, you need to go from a pull mentality to a push mentality. The workers are not pulling from BullMQ, instead you create a scheduler that checks BullMQ every second/minute/hour, whatever works. The scheduler can run along with BullMQ.

If a job is found in the queue, the scheduler pulls it from the queue and then PUSHES it out to the worker machines.

The worker machines should have an HTTP interface for accepting requests. They should be able to accept more than 1 request at a time. Setup your connection limits and Fly will spin up your workers when necessary and distribute the work over HTTP. Then, when there is no work left to do, the worker machine will spin back down to 0.

roadmr · October 7, 2024, 7:59pm

You can use metrics-based autoscaling to spin workers up or down based on queue size. Metrics-based Autoscaling

It uses the Machines API under the hood but you don’t have to think too much about that.

On Python land I’d run Celery Flower next to my Celery workers to publish queue size metrics for the auto-scaler to base its decisions on. I’m not sure how to publish queue size metrics with BullMQ, but pretty sure there’s a way to do so

Daniel

miend · October 7, 2024, 10:33pm

Like @roadmr was saying, you can have one very cheap machine in your app publishing metrics (like queue_length), and a cheap Fly Autoscaler deployment configured to listen and respond with immediately starting workers when queue_length > 0.

Either that, or you configure your worker app to take jobs by pushing them, so the fly proxy can wake them up, and they shut themselves down when they’re done processing jobs.

empz · October 8, 2024, 7:51am

That sounds like a good approach. I’ll look into the Autoscaler. Thanks!

empz · October 8, 2024, 2:00pm

I want to start with something simpler without involving autoscaler, metrics and such so I’ll give using Machines API directly a try.
When using fly scale I can add a bunch of new machines of a specific service group very quickly and easily without specifying configuration. Is this not possible with the Machines API? I mean, the configuration is already stored within the app, right? But this doesn’t seem to be possible with the API. I need to specify a full config object for every machine creation request?

I think I’ve read somewhere that “fly scale count” will clone machines. Is that not possible to do with the API? I could not find a clone endpoint here: Fly Machines API

Basically, I want to simply do the following:

Have a cheap public web service taking requests 24/7
For each request, calculate the number of machines required to perform the work (my own logic).
Create the needed machines, wait until they are “started”. I could tolerate a cold start of 10s for a machine to be created and ready (I understand they are created within a few seconds?).
As soon as a machine is started, use their internal IP address to send a request to perform the job.
As soon as each machine is done with its work. It should be destroyed. I’m guessing each machine can call the Machines API with their own ID to request destroyal. If not, I can do it from the cheap webservice.

If all this makes sense, how can I do this with the Machines API? Right now, I’m struggling with the config object required for creating new machines. config.image required, where’s the Docker image of an app stored? Where can I see that? It’s not in the configuration tab.

Why are we required to provide a full config again if the /machines endpoints are below a Fly App, so they belong to the app and should know about its image and configuration.

Oh, and I’ve just read that the Machines API is rate limited per action… So If I need to create 8 new machines, I can’t just make 8 parallel requests? I need to do one at a time waiting 1 second between each? Uhm…

empz · October 8, 2024, 2:18pm

A first try to creating new machines through the API results in machines taking about 20-25 seconds to be created and started. It looks like pulling the image from the registry takes 20s. Is this expected? I’m pretty sure I read somewhere that creating a machine should not take more than 5 seconds and starting a stopped machine should take under a second.

charsleysa · October 8, 2024, 10:42pm

Pulling from the registry can be pretty slow, if you have small images then its not really noticeable but if you have very large images you will notice it. However, the host servers cache images so if you are using the same image it should be much faster after the first machine on that host.

system · October 15, 2024, 10:42pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Autoscaling on CPU utilization? Questions / Help	15	2083	May 18, 2023
Is there a way to replicate this setup in Fly?	2	598	February 27, 2022
How to have one request/job per machine? Questions / Help autoscaling , temporal	5	144	October 9, 2024
Alternative auto scaling options to "http_service.concurrency" Questions / Help machines	2	158	April 24, 2024
Autoscaling Fly Machines (v2) Questions / Help machines	2	1219	November 6, 2023

Queue/Worker architecture with Autostop/autostart Machines?

Related topics