Auto-scaler + auto start/stop interplay?

poundifdef · June 12, 2024, 6:14pm

I’ve spent this morning getting the fly autoscaler to work. I have a bunch of questions around how to configure it! So here goes:

It seems like the fly auto-scaler runs independently of the built-in auto start/stop functionality. Should I worry about them bumping into each other? Suggestions on rules of thumb to follow? I feel like I’ve had some instances where a machine remains in the “off” state and might be a difference between the auto-scaler and autostop both making changes.
Looking at the code, it seems like the autoscaler will forcibly kill machines when set to create+destroy mode. Is kill_timeout respected when this happens?
When machines are stopped, is ephemeral disk wiped? (Is there any reason not to create/destroy, rather than stop/start? It seems cheaper, money-wise, to destroy, so is startup speed is the deciding factor?)
Any suggestions for how to use multiple metrics? For example, “Scale when disk is 70% full or queue depth greater than 10”?
Will the auto-scaler reap machines in the off state when it’s in “destroy” mode?
Does the auto-scaler first try to turn on machines, and then try to create new ones? Or is is exclusively one or the other?

Thank you!

poundifdef · June 13, 2024, 3:15pm

Answering some of my own questions:

Looking at the code, it seems like the autoscaler will forcibly kill machines when set to create+destroy mode. Is kill_timeout respected when this happens?

Yes, kill_signal is respected.

When machines are stopped, is ephemeral disk wiped?

Yes

benbjohnson · June 14, 2024, 3:16pm

The autoscaler is independent of the Fly Proxy autostart/autostop but they typically have different use cases. If you’re using the built-in autostart/stop then you’re doing request-based scaling whereas with the fly-autoscaler it’s more about metrics-based scaling.

Startup speed is typically the deciding factor. Create/destroy may take a second or two whereas start/stop is usually in the hundreds of milliseconds.

The Expr language has some useful functions around min() and max() that can be used to combine metrics. For example, if you had two separate queues you could take the greater of the two:

max(queue_depth_1, queue_depth_2)

We’ve debated the best way to implement “scale when above threshold”. For some other autoscalers, it’s essentially just a boolean value to scale up when above a threshold but that’s not very flexible.

The easiest approach is probably to add the current value so you can use that when scaling. e.g.

started_machine_count + (disk_full_pct > 70 ?  1 : 0)

That’s not currently available but it should be easy to add. I created an issue to track it (#31).

Yes, it will destroy any machines when the number of created machines is below the minimum created machine threshold (if set).

It current just creates new ones unless you also set the started machine count. It’s not currently designed to work with autostop so it assumes that if machines are created then they are also running.

system · June 21, 2024, 3:17pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
More flexible autoscaling with fly-autoscaler Fresh Produce autoscaling	0	347	March 20, 2024
fly-autoscaler v0.2.1: Scaling via create/destroy, multi-region support Fresh Produce autoscaling	3	282	April 11, 2024
Autoscale from metrics	6	444	November 9, 2023
Metrics-based Autoscaling Fresh Produce metrics , autoscaling	2	776	October 8, 2024
Automatically starting/stopping Apps v2 instances Fresh Produce	50	8531	November 24, 2024

Auto-scaler + auto start/stop interplay?

Related topics