I’m trying to see if fly.io can be a good alternative to Google’s Cloud Run and I wanted to ask if these two features are available in fly.io:
Is it possible to have a concurrency of 1 per container? In other words, a single request would be handled per container; I need this because the task is not parallelizable in a single container.
Is it possible to scale to a large number of containers (e.g. 1000 instances)?
The quick answer to each of your questions is “yes,” with a caveat on #2.
Yes. Instances on fly.io are Firecracker VMs built with your container image. You can set a hard limit to restrict VMs to a single in-flight request: App Configuration (fly.toml)
Yes, you can scale to a large number of VMs. However, our autoscaling probably won’t do what you want; it happens every 15s, which is very slow if you need to add an instance in response to each request.
So if I understand correctly, I can work around the autoscaling delay by writing a proxy that takes requests and spawns containers on fly.io? When using the APIs, what would be the expected delay for a new container? I could probably work with a delay of ~5s.