When creating a new app, the generated config contains these lines.
hard_limit = 25
soft_limit = 20
type = "connections"
Does that mean that my app under the load can spin up 24 additional VMs? I don’t want to find my credit card empty when this happens. How to make sure that the app will stay in one VM within free tier?
Nope. This is for autoscale from my understanding. It’s saying that if you have 20 simultaneous connections, then the app will spin up another VM. You define the min/max VMs via the autoscale flyctl command.
It’s a bit confusing because when I first started learning Fly, I assumed that both scale count and autoscale work in tandem. But this is not true – if you enable autoscale, then the scale count gets disabled, and vice-versa. Moreover, the [services.concurrency] in the toml is redundant if autoscale is not enabled.
Either of the Fly employees, please correct me if I’m wrong! The documentation doesn’t seem to be clear (perhaps its been updates since I last read it though) on this tidbit.
Do you also do this connection queueing when autoscale is disabled?
Also, I am using the tls and http connection handlers. Web browsers use keep alive – so from my understanding, the fly load balancer deals with all these persistent connections coming in from user’s web browsers, and then forwards the HTTP requests to my app instances? When the concurrent connections from the fly load balancer to an app instance exceeds the soft limit specified in [services.concurrency], then fly will spin up another instance?
In other words, if I have a simple REST HTTP GET call that returns some data, 20 of these calls will have to happen simultaneously (soft_limit of 20) before fly will spin up another instance? Am I understanding this correctly?
Also, does the fly load balancer open persistent connections with my app instances?
Hi @jerome, thanks for the reply! If connections from the fly load balancer to our app is not persistent, and my app is just doing simple HTTP REST handling, won’t the type = "request" have the same behaviour?
Is the measurement done at the app, or is it done at the load balance? E.g. 20 simultaneous connections from the load balancer to the app causes a scale, or 20 simultaneous connections to the load balancer?
Using type = "requests" makes connections persistent. We only keep them idle for ~2 seconds though because some apps have very aggressive keep alive timeouts and it might conflict. Creating new connections is fast anyway.
At the app level. That’s where the limits are enforced. We use the edge connection numbers to determine which region we should create new instances in.