Autoscale from metrics

rupurt · October 11, 2022, 12:44am

Currently the autoscaler is based on the number of connections.

Do you have any plans to support alternative means of autoscaling? It would be awesome if there were options to scale horizontally (more instances) and vertically (more CPU/MEM).

For data ingestion workloads this would be super helpful and make them more elastic. Something along the lines of Kubernetes pods autoscaling with Kafka metrics | by Roman Noze | Engineering at Palo Alto Networks | Medium

kurt · October 11, 2022, 1:43am

We vaguely plan to bake this in. The current autoscaler is actually not what most people want, so we’re starting by tackling that. What they usually want is for machines to start on demand when there are a certain number of connections.

Scaling with different strategies is something I think people might end up building with the machines API. You can run a machine that wakes up every so often, and then starts other machines for you. Or run one that continuously polls metrics and starts and stops other things.

ignoramous · October 11, 2022, 5:09am

I experimented with a Machine that didn’t run a server, but just a process (ie, no [[services]]). I don’t remember being able to start it with the fly m start command. It was tore down everytime. But if I understand you right, this usecase is supported (but it doesn’t / didn’t work…)?

Autoscale, believe it or not, is/was one of the most interesting among Fly’s features which sadly isn’t getting the eng love it deserves.

In fact, the Fly proxy needs to observe and react in-time to (an app’s / a process’ / a machine’s) increasing / decreasing requests / connections wrt soft_limit and hard_limits, which in my experience, it doesn’t quite do so. I imagine it is a hard problem (and relates to autoscaling in some sense) no doubt… but one that needs solving.

rupurt · October 12, 2022, 5:29am

@kurt glad to hear improvements are on the radar.

@ignoramous very much agree here. It feels super close to being what people naturally expect + adaptable to other contexts like stream processing.

Autoscale, believe it or not, is/was one of the most interesting among Fly’s features which sadly isn’t getting the eng love it deserves.

goncalo-oliveira · November 9, 2023, 4:35pm

@kurt

I was about to post a question regarding this. When we’re dealing with worker apps, that have no public connection to the outside work, the requirements to scale these worker apps have nothing to do with the number of connections or requests.

A stream processing worker for instance, might need to scale up if the number of unprocessed messages in the stream suddenly increases. It seems the only way to do something near this is to write a worker app that periodically checks these conditions and uses machines api to scale up or down, though, this seems like a lot of trouble.

Since custom metrics are already baked in, I think that it would make sense to use them. Maybe being able to define thresholds based on custom metrics.

wjordan · November 9, 2023, 4:55pm

Yes, we hear you on this, I think the use case you described is very clear. Autoscaling based on prometheus metrics is something we’d like to eventually support in the platform.

Yeah, that’s the current do-it-yourself approach that works today. It’s not ideal, but if you want to go this direction here’s a pointer to a Bash-script example of this that could help you get started:

goncalo-oliveira · November 9, 2023, 5:12pm

Thanks for the quick reply @wjordan

I was actually thinking that since I already have a worker app running that has access to these metrics (since it produces them), I might try to connect to the machines api directly - the app scaling itself
It’s not ideal, for sure, but until custom metrics are supported, it might be an option.

Topic		Replies	Views
Metrics-based Autoscaling Fresh Produce metrics , autoscaling	2	770	October 8, 2024
CPU autoscaling	4	434	June 3, 2021
Rolling my own autoscaling for Fly Machines	11	2340	August 24, 2023
Austoscale vs Autostop Questions / Help autoscaling	4	37	October 1, 2024
fly-autoscaler v0.2.1: Scaling via create/destroy, multi-region support Fresh Produce autoscaling	3	281	April 11, 2024

Autoscale from metrics

Related topics