Doing a load test for 5 minutes does not trigger autoscaling even with multiple warnings from the proxy:
2022-07-25T18:32:43Z proxy[eb2ee7d6] ams [warn]Instance reached connections hard limit of 25
2022-07-25T18:32:43Z proxy[eb2ee7d6] ams [warn]Instance reached connections hard limit of 25
2022-07-25T18:32:43Z proxy[05525023] sea [warn]Instance reached connections hard limit of 25
2022-07-25T18:32:43Z proxy[05525023] sea [warn]Instance reached connections hard limit of 25
2022-07-25T18:32:43Z proxy[05525023] sea [warn]Instance reached connections hard limit of 25
2022-07-25T18:32:43Z proxy[05525023] sea [warn]Instance reached connections hard limit of 25
2022-07-25T18:32:43Z proxy[eb2ee7d6] ams [warn]Instance reached connections hard limit of 25
2022-07-25T18:32:43Z proxy[eb2ee7d6] ams [warn]Instance reached connections hard limit of 25
Going thru the previous threads about autoscale not scaling which state:
My configuration doesn’t seem to have any of these issues. Anyone have any ideas?
$ fly autoscale show
Scale Mode: Balanced
Min Count: 2
Max Count: 7
$ fly scale show
VM Resources for testmysql
VM Size: shared-cpu-1x
VM Memory: 512 MB
Count: 2
Max Per Region: Not set
Just to ensure I wasn’t crazy, I created yet another app to duplicate the behaviour. Ran the following commands and only did a single deploy. Same behaviour.
fly scale memory 512
fly regions add ams
fly regions add mia
fly autoscale balanced min=2 max=7
fly deploy -i registry.fly.io/blabla:0.1
I think I found an internal bug possibly causing this issue, and I’ve updated your instance definitions with a tentative fix. Could you do another load test on the running instances (without re-deploying) and let me know if it works this time?
The 2nd app is scaling correctlly. Takes about 60-90 seconds to come up. Scale down takes about 10 minutes. Is that the expected behaviour?
testmysql is borked in some strange fashion. The logs show that deployments are being attempted and at a very quick pace to scale up, but no instances actually get created.
From the 2nd app:
Instances
ID PROCESS VERSION REGION DESIRED STATUS HEALTH CHECKS RESTARTS CREATED
55ac6af4 app 11 mia run running 1 total, 1 passing 0 3m21s ago
700bd4ca app 11 mia run running 1 total, 1 passing 0 5m31s ago
eda38388 app 11 sea run running 1 total, 1 passing 0 7m14s ago
fc4eb3a9 app 11 sea run running 1 total, 1 passing 0 8m31s ago
a323536a app 11 ams run running 1 total, 1 passing 0 1h48m ago
04d693f9 app 11 ams run running 1 total, 1 passing 0 1h48m ago
Thanks for the additional testing! I’ll work on rolling out the fix more permanently.
The autoscaling service is designed to scale up quickly and scale down slowly with a scale-down lag of 10 minutes, so yes that’s the expected behavior.
I think the remaining problem with testmysql may be a separate bug in our deployment system- it looks like you had launched two apps both referencing the same container image, deleted one of them, and now the remaining app (testmysql) can’t locate its image in the registry. We’ll look into this issue as well but for now re-deploying the app should get you unstuck.
FYI, there’s a race condition with autoscaling that may be tricky to re-produce.
Essentially, the tcp connection health check could fail as it seems traffic gets redirected to the app before the tcp check passes. The vm hits the hard connection limit and will never be healthy at that point. Autoscaling will stop scaling at that point.