Advanced concurrency, scaling & load balancing

lorikku · January 9, 2025, 8:51pm

Newcomer here, absolutely loving Fly so far, saved me a lot of hassle

I have a Node.js application which is an API waiting for a request, after which it spawns a BIG process that basically takes up the whole machine’s capacity.

After idling the machine stops automatically, so that’s brilliant.

Right now the only supported concurrency types are requests and connections. Can I somehow manage to limit a machine from receiving more API requests when that “big” process is running? Some sort of hard limit on capacity? Or will I have to create a separate API that handles this logic (manually creating/destroying/using machines etc.)?

The process takes such a long time, the request would take hours to resolve. That’s why I cannot afford to keep an HTTP request open, so I immediately respond with OK when my API gets a correct request.

Blatantly put: I want the ‘load balancing’ to respect that 1 active “BIG” process per machine.

Also another side question when it comes to scaling, is there a way to automatically scale up the amount of machines when current machines are almost being used up? Otherwise, I’ll do that manually as well .

kurt · January 9, 2025, 8:57pm

You have a couple of options!

What I’d start with is fly-replay. If your process knows it can’t handle a request, you can return fly-replay: elsewhere=true. This will make our proxy send the request to another Machine.

If you have a bunch of machines that are busy at any given time, that might not work as well as you want. What I might do there is take advantage of healthchecks AND replay. If you make your Machine fail a health check, we won’t send request to it until it passes again.

Healthchecks might take like 15s to hit, so replay will work while that catches up, and then our proxy will stop considering it for requests.

If you can split up the quick part of the request and the heavy CPU part of the request, you can also consider just running two types of machines. Small ones to “route” and big ones as “workers”. Then you can set the hard limit on the big ones to 1.

lorikku · January 9, 2025, 9:04pm

THIS IS AMAZING, thank you so much Kurt! Absolutely great insights and options, didn’t even know half of them existed

I’ll probably go with the fly-replay & health check, but just for clarification, what do you mean exactly when you say

Small ones to “route” and big ones as “workers”. Then you can set the hard limit on the big ones to 1.

What hard limit are we talking about here exactly? Honestly I don’t think my brain 100% follows with this method so far, could you clarify this?

kurt · January 9, 2025, 9:12pm

In your fly.toml config you can control concurrency to your machines:

  [http_service.concurrency]
    type = "requests"
    soft_limit = 200
    hard_limit = 250

If you set hard_limit = 1, we will only allow one request at a time to that VM. Does that make sense?

If you have 10 requests come in at a time, and only one of them dominates a machine you wouldn’t want that. But if you can split your machines into two apps only send heavy requests to a “worker” app, you can use that concurrency limit to make sure workers are not getting overloaded.

lorikku · January 9, 2025, 9:20pm

Yes indeed I understand, but that’s the problem that I meant when I said

The process takes such a long time, the request would take hours to resolve. That’s why I cannot afford to keep an HTTP request open, so I immediately respond with OK when my API gets a correct request.

So I cannot ‘lock them’ with a hard_limit = 1 because there won’t be an HTTP request open for that long… That’s the problem I stumbled upon, which forces me to do a ‘hacky’ (but smart) fly-replay and health check method you proposed.

kurt · January 9, 2025, 10:02pm

Ok excellent. Let us know if fly-replay/health check combo works well!

lorikku · January 10, 2025, 12:41am

Will do!

Do you have any insights here too btw?

Also, let’s say that in the end I would want to manage the machines myself anyway (for better performance etc.), could I do that? Like have a custom written queue that starts (and API calls)/stops machines when necessary?

lorikku · January 10, 2025, 2:05am

SOLVED by using [checks] instead

[checks]
  [checks.api_v1_health]
    grace_period = "10s"
    interval = "15s"
    method = "get"
    path = "/api/v1"
    port = 3000
    timeout = "10s"
    type = "http"

For some reason my http check keeps failing with connection refused when I deploy. Also, idk why, but the name is an incorrect servicecheck-00-http-8080 (where the 8080 incorrect).

[build]

[http_service]
  internal_port = 3000
  force_https = true
  auto_stop_machines = 'stop'
  auto_start_machines = true
  min_machines_running = 0
  processes = ['app']

[[services.http_checks]]
    interval = 10000
    grace_period = "5s"
    method = "get"
    path = "/api/v1"
    protocol = "http"
    timeout = 2000

[[services.ports]]
    handlers = ["http"]
    port = 80
    internal_port = 3000

I can access /api/v1 perfectly fine when I surf to it.

system · January 17, 2025, 2:06am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to have one request/job per machine? Questions / Help autoscaling , temporal	5	167	October 9, 2024
Concurrency hard limit not respected? rails , machines , autoscaling , proxy	4	37	June 3, 2025
New to Fly.io, several questions :)	10	420	January 31, 2024
Why does the services.concurrency clause when used with connections not autoscale up? autoscaling , proxy	13	70	March 15, 2025
Using fly.io as an alternative to AWS Lambda Questions / Help	6	2158	June 21, 2023

Advanced concurrency, scaling & load balancing

Related topics