How to have one request/job per machine?

empz · October 2, 2024, 1:06pm

I have a very simple Hono web server with a single endpoint that starts a long-running video rendering process using some Node.js libraries. This is a resource intensive operation so I’d want to only have one concurrent rendering per machine/instance. Each job can take upto 5 minutes to complete (basically awaiting a long-running async function).
I also want to avoid a queue system where subsequent requests need to wait before the previous ones are done.

Is there something built-in or best practice to achieve this with Fly.io and their Fly Proxy and autoscaling features?

My app is already set to shutdown itself after 30 seconds of no requests received. How can I force the autoscaler to spin up a new machine if all the existing ones are busy? Would returning some kind of HTTP status (500? 503?) from busy instances would make the autoscaler spawn a new one and send the request to it?

Thanks

roadmr · October 2, 2024, 1:39pm

Hi empz,
a queue system is the recommended way to handle this. The Fly proxy is not going to do great at assigning one and exactly one request to each of your machines, so using the correct tool for the job (a queue like Celery, Sidekiq, Bullmq) is the way to go.

If you absolutely don’t want to, you can try setting your app’s hard/soft limit to 1 and you’ll run into the situation other people have: What happens is, due to the distributed and eventually-consistent nature of request load information, by the time a request gets routed from the edge proxy to a machine, it might actually already be serving another request and the new one will have to wait (and it’ll eventually time out if your process takes about 5 minutes).

If you set the hard limit a bit higher so machines can at least receive and inspect requests while they’re doing the long process thing, a busy machine can identify that it’s already serving a request and send a response of “try elsewhere” (see here: Dynamic Request Routing · Fly Docs), and the proxy will go around looking for a machine that actually wants to serve the request. If it doesn’t (i.e. all machines respond “try elsewhere”), and can’t start a new machine (note the proxy only starts existing machines, it will never create new machines) then it’ll eventually timeout and 503 on your user.

Daniel

empz · October 2, 2024, 1:52pm

You mean having the flyapp consume from the queue and then shutdown once the queue is empty?
But in that case, how do you start a machine once a message is published into the queue?

I think the “try elsewhere” response could work in my case as another app of my own (outside of Fly.io) will be calling this app only. No end users will be calling this directly so I can have a retry mechanism until I get a valid response. Not great, but does the job I believe.

empz · October 2, 2024, 1:54pm

Also, is there a max request time allowed for public Fly.io web apps? If my processes might take 5 minutes, should I go for an async strategy where I return a response early, do the long process and then call a webhook/callback when it’s done?

khuezy · October 2, 2024, 2:12pm

I have a similar use case. I’m using temporal, you can configure the max execution to 1 workflow at a time.
You can search for it on this forum.

For my non prod temporal apps, I use the fly API to start/suspend it manually when I need to wake it for some work. Once the job is done I suspend it.
I have a cron job on GitHub actions that wakes it up every hour to process any scheduled jobs.

system · October 9, 2024, 2:12pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Is there a way to replicate this setup in Fly?	2	598	February 27, 2022
Queue/Worker architecture with Autostop/autostart Machines? autoscaling	16	453	October 15, 2024
Advanced concurrency, scaling & load balancing Questions / Help machines	8	135	January 17, 2025
Autoscaling on CPU utilization? Questions / Help	15	2083	May 18, 2023
Always exactly one machine machines , autoscaling , temporal	17	840	September 15, 2024

How to have one request/job per machine?

Related topics