Newcomer here, absolutely loving Fly so far, saved me a lot of hassle
I have a Node.js application which is an API waiting for a request, after which it spawns a BIG process that basically takes up the whole machineās capacity.
After idling the machine stops automatically, so thatās brilliant.
Right now the only supported concurrency types are requests and connections. Can I somehow manage to limit a machine from receiving more API requests when that ābigā process is running? Some sort of hard limit on capacity? Or will I have to create a separate API that handles this logic (manually creating/destroying/using machines etc.)?
The process takes such a long time, the request would take hours to resolve. Thatās why I cannot afford to keep an HTTP request open, so I immediately respond with OK when my API gets a correct request.
Blatantly put: I want the āload balancingā to respect that 1 active āBIGā process per machine.
Also another side question when it comes to scaling, is there a way to automatically scale up the amount of machines when current machines are almost being used up? Otherwise, Iāll do that manually as well .
What Iād start with is fly-replay. If your process knows it canāt handle a request, you can return fly-replay: elsewhere=true. This will make our proxy send the request to another Machine.
If you have a bunch of machines that are busy at any given time, that might not work as well as you want. What I might do there is take advantage of healthchecks AND replay. If you make your Machine fail a health check, we wonāt send request to it until it passes again.
Healthchecks might take like 15s to hit, so replay will work while that catches up, and then our proxy will stop considering it for requests.
If you can split up the quick part of the request and the heavy CPU part of the request, you can also consider just running two types of machines. Small ones to ārouteā and big ones as āworkersā. Then you can set the hard limit on the big ones to 1.
In your fly.toml config you can control concurrency to your machines:
[http_service.concurrency]
type = "requests"
soft_limit = 200
hard_limit = 250
If you set hard_limit = 1, we will only allow one request at a time to that VM. Does that make sense?
If you have 10 requests come in at a time, and only one of them dominates a machine you wouldnāt want that. But if you can split your machines into two apps only send heavy requests to a āworkerā app, you can use that concurrency limit to make sure workers are not getting overloaded.
Yes indeed I understand, but thatās the problem that I meant when I said
The process takes such a long time, the request would take hours to resolve. Thatās why I cannot afford to keep an HTTP request open, so I immediately respond with OK when my API gets a correct request.
So I cannot ālock themā with a hard_limit = 1 because there wonāt be an HTTP request open for that long⦠Thatās the problem I stumbled upon, which forces me to do a āhackyā (but smart) fly-replay and health check method you proposed.
Also, letās say that in the end I would want to manage the machines myself anyway (for better performance etc.), could I do that? Like have a custom written queue that starts (and API calls)/stops machines when necessary?
For some reason my http check keeps failing with connection refused when I deploy. Also, idk why, but the name is an incorrect servicecheck-00-http-8080 (where the 8080 incorrect).