I’ve spent this morning getting the fly autoscaler to work. I have a bunch of questions around how to configure it! So here goes:
It seems like the fly auto-scaler runs independently of the built-in auto start/stop functionality. Should I worry about them bumping into each other? Suggestions on rules of thumb to follow? I feel like I’ve had some instances where a machine remains in the “off” state and might be a difference between the auto-scaler and autostop both making changes.
Looking at the code, it seems like the autoscaler will forcibly kill machines when set to create+destroy mode. Is kill_timeout respected when this happens?
When machines are stopped, is ephemeral disk wiped? (Is there any reason not to create/destroy, rather than stop/start? It seems cheaper, money-wise, to destroy, so is startup speed is the deciding factor?)
Any suggestions for how to use multiple metrics? For example, “Scale when disk is 70% full or queue depth greater than 10”?
Will the auto-scaler reap machines in the off state when it’s in “destroy” mode?
Does the auto-scaler first try to turn on machines, and then try to create new ones? Or is is exclusively one or the other?
Looking at the code, it seems like the autoscaler will forcibly kill machines when set to create+destroy mode. Is kill_timeout respected when this happens?
Yes, kill_signal is respected.
When machines are stopped, is ephemeral disk wiped?
The autoscaler is independent of the Fly Proxy autostart/autostop but they typically have different use cases. If you’re using the built-in autostart/stop then you’re doing request-based scaling whereas with the fly-autoscaler it’s more about metrics-based scaling.
Startup speed is typically the deciding factor. Create/destroy may take a second or two whereas start/stop is usually in the hundreds of milliseconds.
The Expr language has some useful functions around min() and max() that can be used to combine metrics. For example, if you had two separate queues you could take the greater of the two:
max(queue_depth_1, queue_depth_2)
We’ve debated the best way to implement “scale when above threshold”. For some other autoscalers, it’s essentially just a boolean value to scale up when above a threshold but that’s not very flexible.
The easiest approach is probably to add the current value so you can use that when scaling. e.g.
That’s not currently available but it should be easy to add. I created an issue to track it (#31).
Yes, it will destroy any machines when the number of created machines is below the minimum created machine threshold (if set).
It current just creates new ones unless you also set the started machine count. It’s not currently designed to work with autostop so it assumes that if machines are created then they are also running.