I have a very simple Hono web server with a single endpoint that starts a long-running video rendering process using some Node.js libraries. This is a resource intensive operation so I’d want to only have one concurrent rendering per machine/instance. Each job can take upto 5 minutes to complete (basically awaiting a long-running async function).
I also want to avoid a queue system where subsequent requests need to wait before the previous ones are done.
Is there something built-in or best practice to achieve this with Fly.io and their Fly Proxy and autoscaling features?
My app is already set to shutdown itself after 30 seconds of no requests received. How can I force the autoscaler to spin up a new machine if all the existing ones are busy? Would returning some kind of HTTP status (500? 503?) from busy instances would make the autoscaler spawn a new one and send the request to it?
Hi empz,
a queue system is the recommended way to handle this. The Fly proxy is not going to do great at assigning one and exactly one request to each of your machines, so using the correct tool for the job (a queue like Celery, Sidekiq, Bullmq) is the way to go.
If you absolutely don’t want to, you can try setting your app’s hard/soft limit to 1 and you’ll run into the situation other people have: What happens is, due to the distributed and eventually-consistent nature of request load information, by the time a request gets routed from the edge proxy to a machine, it might actually already be serving another request and the new one will have to wait (and it’ll eventually time out if your process takes about 5 minutes).
If you set the hard limit a bit higher so machines can at least receive and inspect requests while they’re doing the long process thing, a busy machine can identify that it’s already serving a request and send a response of “try elsewhere” (see here: Dynamic Request Routing · Fly Docs), and the proxy will go around looking for a machine that actually wants to serve the request. If it doesn’t (i.e. all machines respond “try elsewhere”), and can’t start a new machine (note the proxy only starts existing machines, it will never create new machines) then it’ll eventually timeout and 503 on your user.
You mean having the flyapp consume from the queue and then shutdown once the queue is empty?
But in that case, how do you start a machine once a message is published into the queue?
I think the “try elsewhere” response could work in my case as another app of my own (outside of Fly.io) will be calling this app only. No end users will be calling this directly so I can have a retry mechanism until I get a valid response. Not great, but does the job I believe.
Also, is there a max request time allowed for public Fly.io web apps? If my processes might take 5 minutes, should I go for an async strategy where I return a response early, do the long process and then call a webhook/callback when it’s done?
I have a similar use case. I’m using temporal, you can configure the max execution to 1 workflow at a time.
You can search for it on this forum.
For my non prod temporal apps, I use the fly API to start/suspend it manually when I need to wake it for some work. Once the job is done I suspend it.
I have a cron job on GitHub actions that wakes it up every hour to process any scheduled jobs.