Im very new to all of this server stuff and my understanding isn’t very accurate.
I currently have a FastAPI app up and running on the default scale-to-zero machines. I submit a job from my frontend which hits my endpoint (/submit-claim) and then somewhere in my endpoint I run this code to create a background task that runs while my frontend receives the success message response back immediately.
The issue I was facing was that while the background task was running, I was being blocked from making any other requests to my backend, such as navigating to another page which needed to call my server to retrieve the user’s latest claims. I had to wait for the full background task to finish before I could make any more requests. I solved this by setting the number of uvicorn workers to 4 which seemed to allow each worker to handle a different request.
My server will potentially need to be receiving 1,000’s of claims per day which would mean lots of background tasks running, maybe not all at once but at least a few processes at a time ideally
My question is what is the standard way in which I would go about scaling this out? Is setting the number of workers the solution and how many should I set it to? Are there better ways of handling large amounts of requests that will be coming in constantly? For example, if our client wanted to just send through 10,000 submission to the end point all at once how would I be able to handle this, knowing that I’d need to make sure I process all 10,000 of them?
You didn’t show your code so I’m just guessing, but it sounds like your handle_claim_submission method is defined as async, but it uses non-asyncio code (either it runs some CPU-bound operations/calculations or calls e.g. a non-async operation), and this blocks the asyncio event loop.
Yes, everything in that answer still applies. The key thing here is that your function (for which you shared the signature but not the actual code, so I don’t know what it’s doing) is likely doing stuff that’s either CPU-bound or non-async as I explained earlier. You’re in the best position to determine what your code is doing, I can’t read minds or your screen
One of the options is “rewrite your task to not be async” - you said this is not possible, so you have to look at the other solutions.
You already tried #1, which works but will limit how many concurrent requests you can serve, to the number of uvicorn workers you have. (and if your code is CPU bound it doesn’t make sense to increase the number of workers beyond your machine’s number of cores).
You can try the other solutions which move task execution to a thread, thus freeing up the main asyncio event loop. This will work if your tasks simply run non-asyncio, blocking code (e.g. reading files or using a library like requests which is non-async), but if your code is CPU-bound and just does heavy slow calculations this again won’t help as you’re limited by your CPU’s number of cores at all times.
So I would recommend:
If your code is non-cpu-bound, it can probably be refactored to be fully async; but if not, you can move it to a thread and that might get you better results by unblocking the event loop. As an example, if you are using requests to fetch or send data via HTTP to other services, switch to httpx which is fully async-ready. If you’re doing file operations, all those have asyncio-equivalent versions. Ditto for databases: using sqlite3 can be replaced by aiosqlite and there’s an async-compatible Postgres driver, as an example.
If your code is cpu-bound, you’re better off moving execution of those tasks to a dedicated worker with more CPU capacity so your fastapi application can serve more requests (and then quickly enqueue tasks for the worker to perform). For this I would recommend Celery which is battle-tested (and also mentioned in the Stackoverflow thread). There’s a guide on using Celery with Django on Fly.io which might help, even if it doesn’t apply directly to FastAPI; it explains the concepts and moving pieces for a work queue setup.
It really all comes down to what your code inside the handle_claim_submission function is doing, that’s what’ll determine what the best solution is.