My understanding is that apps v2 backed by Machines doesn’t support autoscaling yet. Is there a way to implement my own scale up and scale down to zero based on the requests coming into the app’s load balancer in the meantime?
The new Machines still export metrics to the fly-managed Prometheus instance. Perhaps you could poll that and trigger scaling changes based on what it reports for the app?
Even if you did that, the proxy has to load balance traffic. I am not sure if Fly’s proxy is a capable load balancer for Machines, but in our case, it has been erratic ever since the beginning and that hasn’t changed.
One of the more annoying issue we see is, Machines are started-up to serve just one request, taken down by our code after a predefined timeout, but immediately started-up once again, and get sent a single request again! This has cost-implications, but I’ve heard nothing from Fly despite complaining about it over emails and over forums. Over time, I expect things to improve as most of it is in-preview.
Also, if you start-up more than one Machine of a single app in a region, I am not sure what kind of load balancing to expect. It isn’t documented anywhere (that I know of).
Right now, the proxy only scales is only built to handle the 0 to 1 scaling case. It’s not designed to autoscale across multiple machines. It kinda works, but that’s an unsupported use case at the moment.
When a request or connection comes in, the proxy runs through its load balancer logic, then forwards the user on. Then it ensures the machine is started.
Scaling down is just a matter of exiting, though. If you can teach your process to exit when it’s likely to be idle, you’ll get “scale to zero”. But again, it’s not designed to work with multiple machines.
Is it possible to run our own reverse proxy like Nginx to work to handle autoscaling on each of the fly apps?
The way I’m thinking on implementing this is for just scale down (will tackle scale up later) is:
- query prometheus metrics for response count every 5 minutes to get apps which are actively receiving requests (however, this isn’t foolproof - i think this only works for if the machines are running http servers)
- for “active” apps where the response count > 0, do not scale down. for all other apps, if the number of machines > 0, stop the machines
This will not work for some of our users who will be using machines to spin up web sockets instead of serving http - I don’t think fly emits any metrics which we can use to build a hacky autoscaling here.
I think for scale down, you can probably just do it from within the Machine. Just wire your app up to detect “idle” and exit with status code 0.
We have a tiny go proxy we use for demos that does this here: GitHub - superfly/tired-proxy: An http proxy that's just too tired and eventually shuts down
Yeah! Exiting the process from the application layer works well when I have access to the application logic and can detect when it is idle.
However, I also deploy some of my users’ containers and I don’t have access to their source code, which makes this a bit trickier.
Oh got it! An outside supervisor sounds reasonable. You could try injecting your own code into their docker image, that
tired-proxy project runs as a supervisor. So you start it, pass your custom command in, and it exits when it’s idle.
I had considered using tired proxy - but it seems to be just for http request. If a user is opening up, say, a web socket, or is running a grpc or graphql server, tired proxy would not work, right?
I am a little bit confused. I thought v2 supports autoscaling? as described in this link: Apps v2 Autoscaling. Did I miss something?
While you can automatically start and stop Machines for V2 app, it’s not technically “autoscaling”, because we don’t create and destroy Machines. Auto start and stop will stop and start existing Machines based on traffic/load. You can learn all about this feature here: