I’ve been moving away from Google Cloud as much as possible. Using PG for pub sub, B2 for storage, etc.
But there’s this audio encoding service I still need to run on Google Cloud Run. It’s configured so that each instance will process a single request (since audio/video encoding is CPU heavy). This allows for multiple concurrent encodings downscaling to zero if needed.
Of course I know Fly has an autoscaling feature, but IIRC @kurt mentioned that the service that monitors the number of concurrent requests is kinda slow to respond.
Instead of doing the scaling with requests, I could have some sort of job queue but I have no idea how I could add/remove Fly worker VMs.
Or maybe I could have a region with say 5 VMs permanently active but how could I easily distribute the work to idle VMs?
You can run 5 VMs with a hard limit of
1 and it should do what you want. We’ll only send one request to each of those VMs at any given time.
We have some better options for this coming soon. There aren’t many docs, but you can use the
fly machines plumbing to do a lot of FaaS type setups.
Here’s a proof-of-concept proxy that runs on Fly and starts machines when requests come in. These VMs are responsible for stopping themselves when they’re idle. It works really well: GitHub - superfly/machine-proxy: PoC HTTP proxy for scale-to-zero apps via the Fly machines API
Thanks @kurt !
So what happens when there are more requests than available VMs? Would the requests be kept on hold by Fly until there’s an available VM?
I’m guessing I wouldn’t be able to use autoscaling or you’d have mentioned it as a solution.