I’ve recently encountered a challenge while deploying a FastAPI backend service on Fly.io. My goals are:
Machines should automatically start up when requests come in.
Machines should automatically stop or suspend when idle to save costs (and support auto-scaling nicely!).
Long-running jobs triggered by users should run reliably until completion.
Here’s the catch
Fly.io determines whether a machine is “idle” based on active HTTP connections. This works perfectly for short requests, but it becomes problematic when dealing with long running background tasks (triggered by a http request).
where my_long_running_task() is iteratively writing its results to a Database.
If the HTTP request returns immediately (e.g., after spawning an asyncio background task), Fly.io sees no active connections and stops the machine after a few minutes, interrupting ongoing tasks. On the other hand, as in the frontend app I would like to start multiple such runns at the same time, I would like to not need to keep the request open until completed.
My question for the community
How do you handle long-running tasks in FastAPI (or similar frameworks) on Fly.io, while still leveraging auto-scaling and auto-shutdown capabilities?
Are there elegant solutions or patterns that you’ve successfully used to balance cost efficiency with reliability?
I’m looking forward to your insights, workarounds, and recommendations!
You could have a machine start-up on incoming requests, and then send itself periodic HTTP requests while its internal job is still running. Once the machine is finished on its task, it could either stop itself via the API, or cease the period requests, and let the autoscaler stop it in the normal fashion.
Thank you so much for the very fast reply. This is indeed a workaround that I also had in mind it just feels a bit too “hacky” because we either introduce a new FastAPI endpoint (e.g. /keep_alive) that needs authentication dependency or we call a non existing endpoint throwing an error.
As an alternative I also thought to have two machines (FastAPI machine and Worker machine). Wher n the background of the firs machine we open a connection to the second machine until it is done with the job. But I am not sure whether this would keep alive both machines untul the job is done. Also this does not feel like the “best practice” of doing it!
I wouldn’t worry about it being hacky in the first cut; just get it working, and then improve it if the solution bothers you. The new endpoint doesn’t have to be authenticated as such, just use a hardwired string that the machine itself knows about. Keeping a machine alive is a low-security issue.
A machine to monitor other machines is fine, and I have that arrangement myself. But in your case I’d say it was overkill.
Another idea is to use Fly’s auto-wake-up system, but disable the automatic spin down. Does Fly support that configuration? If so, a machine stopping or deleting itself would be very simple, and you’d not need to a keep-alive device.
Ok, thank you very much, I’ll go with the keep-myself-alive option!
If I may ask just out of personal interest, for what kind of use case do you use a machine that monitors other machines? A very brief answer is fully sufficient!
@Berthold Apologies if I missed a requirement of your architecture, but I’d say this is the way to go. When your app’s main process halts, the Machine it’s running on shuts down. If your app can decide for itself when it’s done and shut itself off, then you can dispense with the fly-proxy concurrency-based autostop.
As the application could be called multiple times using the async setup, this means we would count the number of running jobs inside a global variable and call process.exit(0) once there is no active job anymore?!
Yes, the only thing I would do different is that when the count reached zero, I would call setTimeout to schedule a shutdown of the process, and call clearTimeout if a new job comes in before the shutdown actually occurs.