autoscale max instances

Kikobeats · May 3, 2021, 12:40pm

Hello,

I wondering to know if there is a limitation of the number of max VM that can be run in an app.

$ flyctl autoscale balanced min=18 max=100-a microlink-api
     Scale Mode: Balanced
      Min Count: 18
      Max Count: 50

Looks like the soft limit is 50? for any particular reason?

If the premise is right, could be possible to extend the value?

My application has a hard/soft limit concurrency limit to 1, meaning a 1 request per VM maximizing parallelism (like AWS Lambda).

kurt · May 3, 2021, 3:39pm

We have that limit to prevent crypto mining, mostly. But in general it’s better to run bigger VMs than more of them. This is because:

Our autoscaling is not designed for single request concurrency.
Most apps are so slow to boot, it’s better to run VMs that can handle 10-15 concurrent requests minimum.

For reference, one dedicated-cpu-1x instance is guaranteed about 20x the compute as a shared-cpu-1x. So if you’re running single request concurrency on a shared-cpu-1x, it’s better to change it to 20 requests on dedicated-cpu-1x.

Kikobeats · May 3, 2021, 11:58pm

Thanks for clarification!

I’m running autoscale in because every request has to spawn a Chromium process and the process can’t be shared between requests. This is a very particular thing, but the thig is the approach is working well to me!

When you said autoscale is not designed for single concurrency, I suspect the VM it’s not ready to be used since it needs to pull the image, do health check, etc.

Taking into account the design limitations, What could be a good minimum concurrency value?

For example: Do you think that makes sense setup soft to 1 and hard to 2 for avoiding the VM cold start?

charsleysa · May 4, 2021, 2:35am

As far as I know, fly.io does not keep warm instances during scaling. Once it kills an instance its gone and doesn’t get reused.

AWS Lambda’s reuse of VMs was more of an implementation detail than a specified feature, which is why they released minimum provisioned feature so you would have a minimum number of warm VMs.

If you only want to process 1 request concurrently per VM then you will definitely run into scaling issues. If you can modify your VM to have multiple Chromium processes then the autoscaling would work much better. Having a soft limit of 5 and hard limit of 10 would probably be the minimum for autoscaling to work well.

michael · May 4, 2021, 3:36am

This is how I’d approach it too. A queue of requests that are dispatched to 1+ VMs each with a pool of Chromium processes. We’ll support scaling on custom metrics before too long, scaling Chromium VMs by queue depth sounds rad.

Kikobeats · May 4, 2021, 11:42am

Probably this is very particular with the fact of running a background process like Chromium, where the CPU/MEM resources consumed by the process are not predictable and don’t scale in a linear way.

Let’s say for example you want to keep as many Chromium processes are vCPUs available.

Normally a Chromium process needs a target URL as an input, and that could vary the behavior widely (some URLs are fast, other URLs has a lot of scripts, etc).

That kind of corner situation can affect the performance of the on-fly requests, so for minimizing performance issues it’s preferable to use a horizontal scalability schema rather than vertical.

What I want to understand (and that’s probably a question for @kurt) is: What happens with the incoming requests when all current VM instances reach the hard limit?

If the incoming request needs to wait until a new VM spawn, that’s my big problem in this infrastructure schema.

charsleysa · May 5, 2021, 3:53am

From my understanding, if all the VMs have concurrently hit the hard limit, incoming requests will get dropped with a 502 response.

kurt · May 5, 2021, 3:14pm

That’s roughly correct. When VMs are at their hard limit and new connections show up, we queue them for a while waiting for a VM to come available. If that doesn’t happen in a reasonable amount of time, we serve a 502.

ignoramous · October 12, 2021, 8:16pm

20x is some jump (when the price jump is just 15x: $2/m to $31/m). Are the relative cpu profiles of the various vm sizes documented anywhere?

Topic		Replies	Views
Understanding autoscale stats Questions / Help	2	311	October 24, 2021
Autoscale doesn't seem to launch new instances Questions / Help	6	878	September 17, 2021
Autoscale doesn't seem to work with hard_limit = 1 and soft_limit = 1	13	1322	September 7, 2021
How to increase VM autoscaling beyond 50? Questions / Help	1	332	August 28, 2022
services.concurrency for free tier Questions / Help	9	2725	August 16, 2021

autoscale max instances

Related topics