Advanced concurrency, scaling & load balancing

Newcomer here, absolutely loving Fly so far, saved me a lot of hassle :smile:

I have a Node.js application which is an API waiting for a request, after which it spawns a BIG process that basically takes up the whole machine’s capacity.

After idling the machine stops automatically, so that’s brilliant.

Right now the only supported concurrency types are requests and connections. Can I somehow manage to limit a machine from receiving more API requests when that ā€œbigā€ process is running? Some sort of hard limit on capacity? Or will I have to create a separate API that handles this logic (manually creating/destroying/using machines etc.)?

The process takes such a long time, the request would take hours to resolve. That’s why I cannot afford to keep an HTTP request open, so I immediately respond with OK when my API gets a correct request.

Blatantly put: I want the ā€˜load balancing’ to respect that 1 active ā€œBIGā€ process per machine.

Also another side question when it comes to scaling, is there a way to automatically scale up the amount of machines when current machines are almost being used up? Otherwise, I’ll do that manually as well :slight_smile:.

You have a couple of options!

What I’d start with is fly-replay. If your process knows it can’t handle a request, you can return fly-replay: elsewhere=true. This will make our proxy send the request to another Machine.

If you have a bunch of machines that are busy at any given time, that might not work as well as you want. What I might do there is take advantage of healthchecks AND replay. If you make your Machine fail a health check, we won’t send request to it until it passes again.

Healthchecks might take like 15s to hit, so replay will work while that catches up, and then our proxy will stop considering it for requests.

If you can split up the quick part of the request and the heavy CPU part of the request, you can also consider just running two types of machines. Small ones to ā€œrouteā€ and big ones as ā€œworkersā€. Then you can set the hard limit on the big ones to 1.

1 Like

THIS IS AMAZING, thank you so much Kurt! Absolutely great insights and options, didn’t even know half of them existed :sweat_smile:

I’ll probably go with the fly-replay & health check, but just for clarification, what do you mean exactly when you say

Small ones to ā€œrouteā€ and big ones as ā€œworkersā€. Then you can set the hard limit on the big ones to 1.

What hard limit are we talking about here exactly? Honestly I don’t think my brain 100% follows with this method so far, could you clarify this?

In your fly.toml config you can control concurrency to your machines:

  [http_service.concurrency]
    type = "requests"
    soft_limit = 200
    hard_limit = 250

If you set hard_limit = 1, we will only allow one request at a time to that VM. Does that make sense?

If you have 10 requests come in at a time, and only one of them dominates a machine you wouldn’t want that. But if you can split your machines into two apps only send heavy requests to a ā€œworkerā€ app, you can use that concurrency limit to make sure workers are not getting overloaded.

1 Like

Yes indeed I understand, but that’s the problem that I meant when I said

The process takes such a long time, the request would take hours to resolve. That’s why I cannot afford to keep an HTTP request open, so I immediately respond with OK when my API gets a correct request.

So I cannot ā€˜lock them’ with a hard_limit = 1 because there won’t be an HTTP request open for that long… That’s the problem I stumbled upon, which forces me to do a ā€˜hacky’ (but smart) fly-replay and health check method you proposed.

Ok excellent. Let us know if fly-replay/health check combo works well!

1 Like

Will do!

Do you have any insights here too btw?

Also, let’s say that in the end I would want to manage the machines myself anyway (for better performance etc.), could I do that? Like have a custom written queue that starts (and API calls)/stops machines when necessary?

:white_check_mark: SOLVED by using [checks] instead

[checks]
  [checks.api_v1_health]
    grace_period = "10s"
    interval = "15s"
    method = "get"
    path = "/api/v1"
    port = 3000
    timeout = "10s"
    type = "http"

For some reason my http check keeps failing with connection refused when I deploy. Also, idk why, but the name is an incorrect servicecheck-00-http-8080 (where the 8080 incorrect).

[build]

[http_service]
  internal_port = 3000
  force_https = true
  auto_stop_machines = 'stop'
  auto_start_machines = true
  min_machines_running = 0
  processes = ['app']

[[services.http_checks]]
    interval = 10000
    grace_period = "5s"
    method = "get"
    path = "/api/v1"
    protocol = "http"
    timeout = 2000

[[services.ports]]
    handlers = ["http"]
    port = 80
    internal_port = 3000

I can access /api/v1 perfectly fine when I surf to it.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.