How to enable autoscaling?

I am trying to test Fly’s autoscaling by deploying a simple app and load testing it.

Steps taken:

  1. Deploy a simple Elixir/Phoenix application, using the stock Dockerfile and fly.toml Fly creates for it. (I’ve also tried the same with a simple Ruby on Rails application)
  2. Change [services.concurrency] hard_limit to 250, leave soft_limit at 20.
  3. fly scale count 1. I run in one region, ams, as 95% of traffic of our real app will also be from the Netherlands. ( I have also tried running in two regions, ams and cfr, with the same, i.e. ‘no’, results.)
  4. fly autoscale balanced min=1 max=8
  5. Run Fly’s burn tool to hit the app’s homepage many times for multiple minutes. I’ve tried in-between the soft and hard limit, twice the hard limit, thrice the hard limit.

I would now expect autoscaling to occur, but no new VMs are started, even as “warn: hard limit reached” messages come in in the fly logs.

Am I missing a crucial step somewhere?

This all looks correct to me (and thank you for detailing the work you’ve done so far). Would you mind confirming a few things?

Specifically:

  • The output of fly autoscale status; this will show us what flyctl thinks the autoscale settings currently are

  • The output of fly releases ; this will confirm that the app’s config was definitely updated.

  • the specific flags/values that you’re using with burn. As you’re likely aware, autoscaling decisions are made based on a few minutes’ worth of traffic – so it’s useful to know exactly how long you’re running your tests for.

  • as much as your fly.toml as you’re comfortable sharing, to get a better idea of how your app is configured to receive traffic. On a related note, you might try the setting type ="requests" for your [services.concurrency] block.

Thanks for your reply!

The info you asked for:

$ fly autoscale show
     Scale Mode: Balanced
      Min Count: 1
      Max Count: 8
$ fly scale show
VM Resources for uptime-monitor-backend
        VM Size: shared-cpu-1x
      VM Memory: 256 MB
          Count: 1
 Max Per Region: Not set
$ fly releases
VERSION STABLE  TYPE            STATUS          DESCRIPTION                     USER                    DATE                 
v10     true    scale           succeeded       Update autoscaling config       <REDACTED>           5h42m ago           
v9      false   scale           failed          Scale VM count: ["app, 1"]      <REDACTED>           5h42m ago           
v8      true    scale           succeeded       Update autoscaling config       <REDACTED>           5h48m ago           
v7      false   scale           cancelled       updating region configuration                           5h49m ago           
v6      false   scale           cancelled       updating region configuration                           5h49m ago           
v5      false   scale           cancelled       updating region configuration                           5h49m ago           
v4      true    scale           succeeded       Update autoscaling config       <REDACTED>           5h54m ago           
v3      true    release         succeeded       Deploy image                    <REDACTED>           5h57m ago           
v2      true    rollback        succeeded       Reverting to version 0                                  6h14m ago           
v1      false   release         failed          Deploy image                    <REDACTED>           6h15m ago           
v0      true    release         succeeded       Deploy image                    <REDACTED>       2022-02-06T19:20:07Z

For burn, I’ve most recently used ./burn -c 400 -d 120s https://uptime-monitor-backend.fly.dev/ --verbose --resume-tls.
Maybe two minutes is not long enough for autoscaling to kick in?

I have also recently (on another app, but with a similar setup) tried to use locust, scaling the number of users up to 600 by adding 2 more per second (e.g. reaching the peak after ten minutes, although the soft_limit would be passed after the first ten seconds), with similarly no autoscaling happening.

I have tried both type = "connections" and type = "requests".


The topic you pointed to (this one) did not actually seem to mention anywhere how much time is taken for autoscaling to kick in.
I did find this post by @kurt mentioning “it needs to be maxed out 30-60s”. But maybe that information is outdated?

The topic you pointed to (this one) did not actually seem to mention anywhere how much time is taken for autoscaling to kick in.
I did find this post by kurt mentioning “it needs to be maxed out 30-60s”. But maybe that information is outdated?

Ah, my mistake-- I’ll edit my post to make that correction :slight_smile:

I’ve been attempting to reproduce this issue, so having this information is super useful-- thanks for the quick response!

1 Like

Maybe two minutes is not long enough for autoscaling to kick in?

If you haven’t already, I’d definitely suggest trying for a bit longer than two minutes as a quick troubleshooting step.

Generally speaking, the auto-scaling feature is a definitely bit slow to react, and difficult to induce via loadtest. For example, it requires that the concurrent connections limit be consistently exceeded for all an app’s VMs for over a minute while the app is running. Platform-wise, there are a lot of moving parts which add latency, too.

We’re looking to improve autoscaling, so a major overhaul is in the works for (hopefully!) sometime later this year. It’s in its early stages, but you may want to check out this thread to follow our work on providing more VM-level features!