Auto Scaling - The threshold of when to scale up.

jorrel · August 12, 2022, 6:55pm

I’m deploying an application to serve as a token auth / routing platform, and in testing the auto scaler seems to add a node, mostly as expected.

What I am curious about is: By what metric does the auto-scaler actually scale? Is it on a threshold of CPU usage, HTTP response code error rates, etc?

Can the thresholds be tuned?

The application I am working on can be relatively bursty, based on external conditions, and I would prefer it to scale up sooner than it currently does so I reduce the number of 502 response codes. In testing, I start to get gateway unavailable responses before the next node starts to provision, and these response codes seem to start at about 100% CPU usage.

Ideally more nodes would spin up sooner than later, I don’t mind the extra cost of having an extra node or two around to avoid customer complaints, and the cost gets passed through anyway… so yeah.

tj1 · August 12, 2022, 7:12pm

From my understanding, the scaling and re-routing is based on the soft-limit.

  [services.concurrency]
    hard_limit = 160
    soft_limit = 100
    type = "connections"

I believe there are two types - requests and connections, but I can’t seem to find it in the documentation.

jorrel · August 12, 2022, 7:19pm

Interesting, I’ll give that a shot and follow up here.

jorrel · August 12, 2022, 10:02pm

Following up here, I’ve got concurrency set pretty low to get it to auto scale.

  [services.concurrency]
    hard_limit = 10
    soft_limit = 6

For others in the future:
The app is PHP-FPM based, with 3-6 workers. With 1 vCPU (Dedicated), I seem to exhaust CPU power before getting to 12 concurrent connections, with a sustained peak of about 440 requests per second per vCPU. Auto scaling values seem linear with vCPU, so these (obviously depending on your app it will be different) may be a good starting point for you.

More workers is possible, and does work, but I tend to exhaust CPU usage before the workers saturate well, and this slows down the TTLB for clients.

tj1 · August 12, 2022, 10:37pm

Maybe try type = requests?

It’s not documented, but hopefully it works better for php.

jorrel · August 16, 2022, 5:10pm

The type = requests didn’t seem to do anything noticeable.

However, tuning the number of workers in the FPM pool did seem to allow scaling to be more apparent, which was a good finding overall.

ignoramous · August 17, 2022, 10:35am

I wonder what difference there is between connections and requests… They seem analogous (from a reverse-proxy point of view).

ZoOmZoOm · August 18, 2022, 5:48pm

Requests = https requests?

Connections = concurrent tcp connections?

That’s how i interpreted them

Topic		Replies	Views
Issue with Autoscaling Based on Request Count in Fly.io autoscaling , proxy	5	82	October 27, 2024
Autoscale doesn't seem to work with hard_limit = 1 and soft_limit = 1	13	1326	September 7, 2021
How long does it take for autoscaling to kick in.	4	453	September 24, 2022
Why does the services.concurrency clause when used with connections not autoscale up? autoscaling , proxy	13	64	March 15, 2025
Rolling my own autoscaling for Fly Machines	11	2342	August 24, 2023

Auto Scaling - The threshold of when to scale up.

Related topics