services.concurrency for free tier

When creating a new app, the generated config contains these lines.

    hard_limit = 25
    soft_limit = 20
    type = "connections"

Does that mean that my app under the load can spin up 24 additional VMs? I don’t want to find my credit card empty when this happens. How to make sure that the app will stay in one VM within free tier?

Nope. This is for autoscale from my understanding. It’s saying that if you have 20 simultaneous connections, then the app will spin up another VM. You define the min/max VMs via the autoscale flyctl command.

It’s a bit confusing because when I first started learning Fly, I assumed that both scale count and autoscale work in tandem. But this is not true – if you enable autoscale, then the scale count gets disabled, and vice-versa. Moreover, the [services.concurrency] in the toml is redundant if autoscale is not enabled.

Either of the Fly employees, please correct me if I’m wrong! The documentation doesn’t seem to be clear (perhaps its been updates since I last read it though) on this tidbit.

1 Like

We use this [services.concurrency] for making load balancing decisions. We’ll send request to apps with a load of less than their soft limit first.

Once the hard limit is reached, we do not send any more connections to the app. We’ll queue them for a little bit and retry to find a suitable app instance (one that hasn’t reached its hard limit).

1 Like

I don’t see autoscale option in fly.toml. Is it turned on by default?

It’s not enabled by default, but you can configure it with the flyctl autoscale command.

1 Like

Thanks for the clarification!

Do you also do this connection queueing when autoscale is disabled?

Also, I am using the tls and http connection handlers. Web browsers use keep alive – so from my understanding, the fly load balancer deals with all these persistent connections coming in from user’s web browsers, and then forwards the HTTP requests to my app instances? When the concurrent connections from the fly load balancer to an app instance exceeds the soft limit specified in [services.concurrency], then fly will spin up another instance?

In other words, if I have a simple REST HTTP GET call that returns some data, 20 of these calls will have to happen simultaneously (soft_limit of 20) before fly will spin up another instance? Am I understanding this correctly?

Also, does the fly load balancer open persistent connections with my app instances?

@jake you’re mostly understanding correctly :slight_smile: however, the reality is:

  • we can’t scale fast enough (at the moment) for situations where you need 1 more connection to be handle
  • user connections are indeed persistent, but not connections from us to your app

Our edges will handle a lot more than that, as long as your app handle them fast enough then it’s not a problem.

If you want “requests” concurrency, you can set type = "requests" in the [services.concurrency] block. By default we do connections concurrency which best fits a world of TCP services or websockets.

Again, autoscaling is imperfect, but it’s correct that Fly will only consider scaling your app if it has > soft limit connections for some duration (a few seconds).

Hi @jerome, thanks for the reply! If connections from the fly load balancer to our app is not persistent, and my app is just doing simple HTTP REST handling, won’t the type = "request" have the same behaviour?

Is the measurement done at the app, or is it done at the load balance? E.g. 20 simultaneous connections from the load balancer to the app causes a scale, or 20 simultaneous connections to the load balancer?


Using type = "requests" makes connections persistent. We only keep them idle for ~2 seconds though because some apps have very aggressive keep alive timeouts and it might conflict. Creating new connections is fast anyway.

At the app level. That’s where the limits are enforced. We use the edge connection numbers to determine which region we should create new instances in.

Thank you!