Doing a load test for 5 minutes does not trigger autoscaling even with multiple warnings from the proxy:
2022-07-25T18:32:43Z proxy[eb2ee7d6] ams [warn]Instance reached connections hard limit of 25
2022-07-25T18:32:43Z proxy[eb2ee7d6] ams [warn]Instance reached connections hard limit of 25
2022-07-25T18:32:43Z proxy[05525023] sea [warn]Instance reached connections hard limit of 25
2022-07-25T18:32:43Z proxy[05525023] sea [warn]Instance reached connections hard limit of 25
2022-07-25T18:32:43Z proxy[05525023] sea [warn]Instance reached connections hard limit of 25
2022-07-25T18:32:43Z proxy[05525023] sea [warn]Instance reached connections hard limit of 25
2022-07-25T18:32:43Z proxy[eb2ee7d6] ams [warn]Instance reached connections hard limit of 25
2022-07-25T18:32:43Z proxy[eb2ee7d6] ams [warn]Instance reached connections hard limit of 25
Going thru the previous threads about autoscale not scaling which state:
My configuration doesn’t seem to have any of these issues. Anyone have any ideas?
$ fly autoscale show
Scale Mode: Balanced
Min Count: 2
Max Count: 7
$ fly scale show
VM Resources for testmysql
VM Size: shared-cpu-1x
VM Memory: 512 MB
Count: 2
Max Per Region: Not set
Just to ensure I wasn’t crazy, I created yet another app to duplicate the behaviour. Ran the following commands and only did a single deploy. Same behaviour.
fly scale memory 512
fly regions add ams
fly regions add mia
fly autoscale balanced min=2 max=7
fly deploy -i registry.fly.io/blabla:0.1
I think I found an internal bug possibly causing this issue, and I’ve updated your instance definitions with a tentative fix. Could you do another load test on the running instances (without re-deploying) and let me know if it works this time?
The 2nd app is scaling correctlly. Takes about 60-90 seconds to come up. Scale down takes about 10 minutes. Is that the expected behaviour?
testmysql is borked in some strange fashion. The logs show that deployments are being attempted and at a very quick pace to scale up, but no instances actually get created.
From the 2nd app:
Instances
ID PROCESS VERSION REGION DESIRED STATUS HEALTH CHECKS RESTARTS CREATED
55ac6af4 app 11 mia run running 1 total, 1 passing 0 3m21s ago
700bd4ca app 11 mia run running 1 total, 1 passing 0 5m31s ago
eda38388 app 11 sea run running 1 total, 1 passing 0 7m14s ago
fc4eb3a9 app 11 sea run running 1 total, 1 passing 0 8m31s ago
a323536a app 11 ams run running 1 total, 1 passing 0 1h48m ago
04d693f9 app 11 ams run running 1 total, 1 passing 0 1h48m ago
Thanks for the additional testing! I’ll work on rolling out the fix more permanently.
The autoscaling service is designed to scale up quickly and scale down slowly with a scale-down lag of 10 minutes, so yes that’s the expected behavior.
I think the remaining problem with testmysql may be a separate bug in our deployment system- it looks like you had launched two apps both referencing the same container image, deleted one of them, and now the remaining app (testmysql) can’t locate its image in the registry. We’ll look into this issue as well but for now re-deploying the app should get you unstuck.
FYI, there’s a race condition with autoscaling that may be tricky to re-produce.
Essentially, the tcp connection health check could fail as it seems traffic gets redirected to the app before the tcp check passes. The vm hits the hard connection limit and will never be healthy at that point. Autoscaling will stop scaling at that point.
Hello there
I have the same issue, and have no idea how to trigger the autoscale
We set hard limit to 50 and softlimit to 25
and set autoscale balanced min=2 max=6
when we test with request with 200 req, we just got about 80% success request, and the rest is dropped, and autoscale is not triggered
Even though autoscaling isn’t without problems… verify if limits are set against (tcp) connections or (http) requests (docs), critical distinction between the two since one TCP connection may pipeline several http requests.
Try the new Machines platform which gives more control over how instances are spun up and down in response to demand. A big (documented) limitation right now is that the Machines platform doesn’t deal with the placement of 2 or more machines in the same region the way it should (also see).
We has been trying both
when using connections, we got warning like sin[warn]Instance reached connections hard limit of 50
and when we use requests we got no warning but request is not accepted