Concurrent connections and scaling of servers

Trying to figure out how best to configure the http_service.concurrency setting.

Right now I have type: connections (although I am changing this to requests as these are web servers) and a soft limit of 100 and hard limit of 200.

Here is the config

[http_service]
    internal_port = 4000
    force_https = true
    auto_stop_machines = true
    auto_start_machines = true
    min_machines_running = 2
    processes = ["app"]
    [http_service.concurrency]
        type = "requests"
        hard_limit = 200
        soft_limit = 100

So I would assume that if my app is consistently below 100 requests then eventually my app would scale down to 2 machines. I have 6 total. Instead I am consistently seeing ~4 running. When I look at the concurrency graph I don’t see any one server ever going over 100. Shouldn’t that mean that my server would scale down to 2 machines and chill there until I start hitting the limits?

Thanks!

This post from @merlin last year looks like it got a good answer about the ‘scaling up’ behavior of hard_limit vs soft_limit, for reference

I’m afraid I didn’t find an equally lucid description of the “autoscale down” algorithm.

1 Like

@w8emv ah so it looks like it has to do with long lived connections like WS, those may not be showing the in the graph as clearly. That is my best guess based on the description I saw in that resposne you linked.

1 Like