How does the scaling work in detail?

We started playing around with fly to solve some scaling issues we currently have. I tried to understand the autoscaling feature. So if I understood correctly at latest when reaching the hard_limit the platform should spawn additional instance(s) up to the max limit?

What about down-scaling?

I triggered an upscale with fly autoscale set max=2 which spawned another instance and set the max back to 1.
This is now ~5h ago but the 2nd instance is still alive. Is there any configurable reaction time for up- and downscaling?

Or am I wrong here and scaling is only for moving between datacenters?


As regards the hard_limit, yes if that number is hit, that should trigger a new vm to be created. If you look at your app metrics in the Fly dashboard, you should see a graph of concurrent requests so a new vm would be created if it goes above the set limit.

What complicates autoscaling is there are two modes: standard and balanced. And then the vm distribution depends on another variable: how many regions your app is set to run in. You can see those with fly regions list. You can see more on this page, and if you scroll down there are various commands to set the options depending how you want it to work;

Upscaling should happen quickly (as soon as Fly spots the increased load), but for downscaling, that I’m not sure. As far as I’m aware you can’t specify a time/rule for when that happens. 5 hours sounds wrong, unless the load justifies it and/or the minimum is now two.

I see autoscaling is being reworked but hopefully someone from Fly can assist:

greg, thanks for your explanations!

Ah yes, this was also part of my testings, but lastly I reduced my primary regions to only fra and added some nearby regions to the backup pool:

Region Pool:
Backup Region:

After setting min=2 there was an instance spawned in a backup region which I understand as part of the distributed scaling of fly. But what I don’t get is why this instance is still (13h! :smiley: ) alive while there is absolutely no reason to A) keep another instance in general and B) keep another in backup region.

% fly autoscale show
     Scale Mode: Standard
      Min Count: 1
      Max Count: 5

Oh thanks, good hint. But fully sleeping :zzz:

1 Like

We actually scale based on soft_limit. What concurrency limits do you have set?

Also what does fly scale show have? You you set a count manually, it disables autoscaling.

Currently showing:

VM Resources for xx
        VM Size: shared-cpu-1x
      VM Memory: 256 MB
          Count: 2
 Max Per Region: Not set

But I have not set a specific count=2 before. And as you can see above in the fly autoscale show output, it is definitely enabled.

And by the way, my 2nd instance is still alive without having any traffic now since creating it 3d ago.


Maybe I could have a case to reproduce:

fly autoscale standard min=2 max=5
# Wait some time
fly autoscale set min=1
# Here I expect that the 2nd instance from 1st command gets retired after
# a couple of minutes without having traffic. But does never happen.

fly scale count 0 # or 1, does not matter
fly autoscale standard min=1 max=5

# Now it's correct, I only have 1 instance

So seems like reducing min in an already existing autoscale scenario does not trigger a downscale.