How does the scaling work in detail?

As regards the hard_limit, yes if that number is hit, that should trigger a new vm to be created. If you look at your app metrics in the Fly dashboard, you should see a graph of concurrent requests so a new vm would be created if it goes above the set limit.

What complicates autoscaling is there are two modes: standard and balanced. And then the vm distribution depends on another variable: how many regions your app is set to run in. You can see those with fly regions list. You can see more on this page, and if you scroll down there are various commands to set the options depending how you want it to work;

https://fly.io/docs/reference/scaling/#autoscaling

Upscaling should happen quickly (as soon as Fly spots the increased load), but for downscaling, that I’m not sure. As far as I’m aware you can’t specify a time/rule for when that happens. 5 hours sounds wrong, unless the load justifies it and/or the minimum is now two.

I see autoscaling is being reworked but hopefully someone from Fly can assist:

1 Like