Auto-scaling based on response time?

greg · September 17, 2021, 1:22pm

Hello,

Reading about how auto-scaling currently takes a while to kick in because of the time to gather metrics (Autoscale doesn't seem to launch new instances - #5 by kurt) … I was thinking … what would be neat IMO is if instead it was based on time-spent-waiting-for-request-to-be-served.

For example if the response does not come within e.g 2000ms (set in config) then the instance is under too much load. And so a new instance should be started.

That is perhaps easier for a user than using concurrent connections, since personally I’ve found it hard to pick good values. If the hard limit is set too low then requests that could be handled won’t be, and so are dropped (until auto-scaling happens 60s later). But set it too high and the instance runs out of resources, so the request also fails.

Using response time would (maybe) help with those issues. Perhaps it would cause others. Just a random thought for the to-do list

enstyled · September 17, 2021, 1:55pm

It’s an interesting idea, but has a dangerous scenario - if the long response times are caused not by the application instance itself, but e.g an overloaded database, adding more instances will only make it worse.

kurt · September 17, 2021, 2:52pm

We have some ideas for how to implement this. In theory, there are heuristics you could use to test the effect of adding VMs. It should be possible to try adding an extra VM if response times get slow or queues back up and then see if it helped.

It will likely be a while before we get to this, though. Once we can responsively launch new VMs at request time, I think we’ll learn a lot about how to actually make better scaling choices. This stuff is fun to tune but we’d rather you all don’t have to think about it.

sudhir.j · September 17, 2021, 3:05pm

I suppose the ideal system would first establish a strong correlation with CPU/RAM/connection counts and response time before using one or more of them as the trigger.

greg · September 17, 2021, 3:11pm

Thanks, good points from you all. It’s hard to get right for all cases.

ali · September 18, 2021, 3:07pm

For me I think the ideal would be considering each region independently, summing max disk usage, max memory, total CPU used, total ingress used, total egress used across all instances of the region for the past 5s. Then have high watermark and low watermark for each metric.

If the watermark triggers scale up by 4x, and down by 0.25x-ceil. Ie. 1 → 4 → 3 → 2 → 1.

This considers load balancing across the provisioned instances a separate problem.

Fancy bonus points for adapting the provisioning profile of each app so they can trade off CPU vs memory vs egress: “you appear egress limited, so you get lots of smaller nodes instead of lots of medium sized ones” or vice-versa, “we’ll swap out these instances for a single bigger one”

Topic		Replies	Views
Auto-scaling and uptime monitors	1	230	February 19, 2024
Scaling speed	5	620	May 18, 2023
auto-start and auto-stop response times autoscaling	4	88	October 30, 2024
Austoscale vs Autostop Questions / Help autoscaling	4	40	October 1, 2024
Auto Scaling - The threshold of when to scale up. Questions / Help docs	7	1067	August 18, 2022

Auto-scaling based on response time?

Related topics