Hey, I can’t find any docs on this - does fly actually downscale when doing autoscaling? And if so, what’s the conditions to actually do so?
Scaling and Autoscaling · Fly Docs talks about scaling up, but no mention of scaling down.
Yes, Fly autoscale
s up and down depending on soft_limit
entry defined in [[services]]
section Do unused fly.io instances automatically shut down? (+ other region questions) - #2 by jerome
Autoscale up triggers in a minute or two in response to sustained traffic above the `soft_limit| threshold. Autoscale down is exactly its inverse, I’d imagine.
Btw, Fly engs are working on an improved autoscale
solution apparently: Autoscale doesn't work - #6 by kurt
I’ve been testing in the meantime. Service of soft of 2, hard of 5, type = “requests”, standard min=1 max=5 - forced it to scale up with some load, and it’s not scaled down for… 46 minutes as of writing this. 0 requests.
fly autoscale
aims to scale down more quickly than it aims to scale up (1-2 min vs. 10 min)
Because of how these conditions are measured and applied, it’s a bit harder to trigger synthetically than you’d expect, and generally works more reliably for higher soft_limits (20 is the default).
Incidentially, we’re working on swapping out our scheduler; this move will help us improve autoscaling as well.
I’m curious why mine hasn’t downscaled for an hour now then. If it helps, deployment ID is 394bc243-7a43-c10f-e065-9a941dcb9242
(it’s just a basic hello world server with a delay) and the traffic was induced using the oha
tool (oha — Rust application // Lib.rs) pinging off at 10 reqs/second (oha -q 10 -z 5m <url>
- though I didn’t run it for the full 5m)
Really good question! And as it turns out, there was a bug with downscaling if there are 0 requests. We’ve manually applied a fix-- could you let us know if it works?
Sure! I’m running a separate test right now with a sustained load, but I’ll give that a go shortly and get back to you.
@eli Heyo! Doesn’t seem to be scaling back down.
I saw it start to do so in that previous deploy (the one with the ID mentioned prior), but it seemed generally inconsistent/buggy (scaled up in under 10s!), so I made a new one and no dice - it’s been about 15 mins. Current deployment is a0bba304-5711-81b6-41b3-8b6002742667
Thanks for getting back to us so quickly!
I made a new one and no dice
So this part makes sense – the fix in question specific to the first app. That’s interesting to hear that it scaled up so quickly in addition to scaling down. We can look into this a bit more.
It’s quite possible the strange behavior is something other than the change we made, too.
Ah okay! I can keep this one up now if you need further testing. I can also send over the demo app + config if it’d help.
Hey there – I don’t have any new info on this bug just yet, but I did want to let you know that there’s no need to keep your app around for testing since we’ve pinpointed the issue pretty well. Thank you so much for offering!
I’ll be sure to update this thread with new info as soon as we have it.