Autoscale doesn't seem to work with hard_limit = 1 and soft_limit = 1

Hi,

My web service only handles one request at a time, by design.

In my config I have:

[services.concurrency]
  hard_limit = 1
  soft_limit = 1
  type = "connections"

I have run flyctl autoscale standard min=1 max=10, but whenever I send multiple requests to the service, autoscaling doesn’t seem to happen, and requests beyond the ones handled by the initial VM time out. What do I need to do differently?

Edit:

I also see “lhr” listed under both the region pool and backup regions when running flyctl regions list, could this be a bug?

Ah this is a quirk of how we do autoscaling. It’s metrics based so there’s a bit of lag when we add nodes. Which means setting a limit of 1 won’t spin a new VM up in time.

If you give me your app name we can opt you in to new autoscaling that’s quite a bit quicker to respond. It will still take ~15s to boot a new VM but you might have fewer timeouts.

Sounds good, I have sent you a message with the name, thanks!

Ok you should be all set! It’ll at least scale quicker now.

I tried initiating 8 requests using curl from my laptop, 2 were processed by the initial VM, 6 timed out, and no new VMs were launched (from looking at flyctl status), any idea why they aren’t launching?

You’ll probably need to go over the soft limit for >15s. The simplest test is going to be one long lived connection and repeated attempts at a second until it scales.

Also I want to reiterate that we’re not good at single connection scaling (yet). Our autoscaling is designed for traffic that ramps up and VMs that can handle ~25 concurrent requests. The new autoscaling is quicker to respond but still not fast enough for what you have configured.

Hi @kurt, stumbled upon this discussion while facing the same problem as @matthewrobertbell. However, we have soft limits at 20 and hard limits at 25. Been sending multiple requests (60+) simultaneously and the servers don’t seem to scale up.

PS: Running tests with k6 and have consistently kept requests above 25 for over 10 mins, still no luck scaling

Hi @viraj! There might be a few reasons for this:

  1. We just shipped (like today) a new autoscaler that is much more responsive. I opted your apps into it.
  2. Your apps are technically using connection concurrency. If you’re sending multiple requests over shared connections with k6s, it won’t affect the scaling numbers.

For #2, I would suggest adding type = "requests" to the concurrency block in your fly.toml.

Will you run your test again and see if those things improve? With new autoscaling you should see fly status show new instances within about 30s of a test starting.

Hi @kurt, thanks for getting back. Redeployed with type="requests" and ran the tests again.

I’m able to see the hard limit warning in logs. However, the server doesn’t seem to autoscale


PS: This is the concurrency config which I’m using
image

Ok I looked at the metrics and this seems to be a bug scaling from one to two.

  1. It peaked at 25 concurrent requests. This was because it was hitting the hard limit, most likely.
  2. The scaling metric actually said “hey, we need 1.25 VMs for this load”. We obviously can’t add 0.25 VMs so it didn’t add any.

I just tweaked your app to use a ceil function there. This should help if you want to test again.

Hey @kurt that did help. Most of the tests performed afterwards had better results and lower error rates. Couple of follow up questions

  1. Can we define parameters based on which scaling takes place?
  2. When we release new apps, will the new scaling be automatically applied to that too?

There aren’t any scaling parameters to define yet. We’d like to let you specify a metrics query to scale on, but as you can see from our ceil issue that’s actually kind of hard!

That ceil bug is fixed, all new apps should behave the same as yours now.

Thanks for sorting out the bug.
Re metrics query, what I am looking for is to autoscale based on cpu and/or memory loads. Say if loads are above 80% scale up and vice versa.
One concern with the current implementation I have is that a VM takes a long time to spin up and during that period some of the requests error out. Ideally, I imagine that could be avoided by defining thresholds for scaling up servers.

We have plans that make VMs boot a lot faster. We’re getting there and have already started testing this new scheduler with the new remote builders.