I’ve been moving away from Google Cloud as much as possible. Using PG for pub sub, B2 for storage, etc.
But there’s this audio encoding service I still need to run on Google Cloud Run. It’s configured so that each instance will process a single request (since audio/video encoding is CPU heavy). This allows for multiple concurrent encodings downscaling to zero if needed.
Of course I know Fly has an autoscaling feature, but IIRC @kurt mentioned that the service that monitors the number of concurrent requests is kinda slow to respond.
Instead of doing the scaling with requests, I could have some sort of job queue but I have no idea how I could add/remove Fly worker VMs.
Or maybe I could have a region with say 5 VMs permanently active but how could I easily distribute the work to idle VMs?