I am trying to test Fly’s autoscaling by deploying a simple app and load testing it.
Steps taken:
Deploy a simple Elixir/Phoenix application, using the stock Dockerfile and fly.toml Fly creates for it. (I’ve also tried the same with a simple Ruby on Rails application)
Change [services.concurrency] hard_limit to 250, leave soft_limit at 20.
fly scale count 1. I run in one region, ams, as 95% of traffic of our real app will also be from the Netherlands. ( I have also tried running in two regions, ams and cfr, with the same, i.e. ‘no’, results.)
fly autoscale balanced min=1 max=8
Run Fly’s burn tool to hit the app’s homepage many times for multiple minutes. I’ve tried in-between the soft and hard limit, twice the hard limit, thrice the hard limit.
I would now expect autoscaling to occur, but no new VMs are started, even as “warn: hard limit reached” messages come in in the fly logs.
as much as your fly.toml as you’re comfortable sharing, to get a better idea of how your app is configured to receive traffic. On a related note, you might try the settingtype ="requests" for your [services.concurrency] block.
$ fly autoscale show
Scale Mode: Balanced
Min Count: 1
Max Count: 8
$ fly scale show
VM Resources for uptime-monitor-backend
VM Size: shared-cpu-1x
VM Memory: 256 MB
Count: 1
Max Per Region: Not set
$ fly releases
VERSION STABLE TYPE STATUS DESCRIPTION USER DATE
v10 true scale succeeded Update autoscaling config <REDACTED> 5h42m ago
v9 false scale failed Scale VM count: ["app, 1"] <REDACTED> 5h42m ago
v8 true scale succeeded Update autoscaling config <REDACTED> 5h48m ago
v7 false scale cancelled updating region configuration 5h49m ago
v6 false scale cancelled updating region configuration 5h49m ago
v5 false scale cancelled updating region configuration 5h49m ago
v4 true scale succeeded Update autoscaling config <REDACTED> 5h54m ago
v3 true release succeeded Deploy image <REDACTED> 5h57m ago
v2 true rollback succeeded Reverting to version 0 6h14m ago
v1 false release failed Deploy image <REDACTED> 6h15m ago
v0 true release succeeded Deploy image <REDACTED> 2022-02-06T19:20:07Z
For burn, I’ve most recently used ./burn -c 400 -d 120s https://uptime-monitor-backend.fly.dev/ --verbose --resume-tls.
Maybe two minutes is not long enough for autoscaling to kick in?
I have also recently (on another app, but with a similar setup) tried to use locust, scaling the number of users up to 600 by adding 2 more per second (e.g. reaching the peak after ten minutes, although the soft_limit would be passed after the first ten seconds), with similarly no autoscaling happening.
I have tried both type = "connections" and type = "requests".
The topic you pointed to (this one) did not actually seem to mention anywhere how much time is taken for autoscaling to kick in.
I did find this post by @kurt mentioning “it needs to be maxed out 30-60s”. But maybe that information is outdated?
The topic you pointed to (this one) did not actually seem to mention anywhere how much time is taken for autoscaling to kick in.
I did find this post by kurt mentioning “it needs to be maxed out 30-60s”. But maybe that information is outdated?
Ah, my mistake-- I’ll edit my post to make that correction
I’ve been attempting to reproduce this issue, so having this information is super useful-- thanks for the quick response!
Maybe two minutes is not long enough for autoscaling to kick in?
If you haven’t already, I’d definitely suggest trying for a bit longer than two minutes as a quick troubleshooting step.
Generally speaking, the auto-scaling feature is a definitely bit slow to react, and difficult to induce via loadtest. For example, it requires that the concurrent connections limit be consistently exceeded for all an app’s VMs for over a minute while the app is running. Platform-wise, there are a lot of moving parts which add latency, too.
We’re looking to improve autoscaling, so a major overhaul is in the works for (hopefully!) sometime later this year. It’s in its early stages, but you may want to check out this thread to follow our work on providing more VM-level features!
Hello @qqwy! Just letting you know we just tracked down an internal bug in the autoscaling service that was causing scaling events to not get triggered properly, sorry for the inconvenience. It should be working correctly now so please give it another try!