services.concurrency for free tier

anatoli_devops · August 13, 2021, 3:16pm

When creating a new app, the generated config contains these lines.

  [services.concurrency]
    hard_limit = 25
    soft_limit = 20
    type = "connections"

Does that mean that my app under the load can spin up 24 additional VMs? I don’t want to find my credit card empty when this happens. How to make sure that the app will stay in one VM within free tier?

jake · August 13, 2021, 3:19pm

Nope. This is for autoscale from my understanding. It’s saying that if you have 20 simultaneous connections, then the app will spin up another VM. You define the min/max VMs via the autoscale flyctl command.

It’s a bit confusing because when I first started learning Fly, I assumed that both scale count and autoscale work in tandem. But this is not true – if you enable autoscale, then the scale count gets disabled, and vice-versa. Moreover, the [services.concurrency] in the toml is redundant if autoscale is not enabled.

Either of the Fly employees, please correct me if I’m wrong! The documentation doesn’t seem to be clear (perhaps its been updates since I last read it though) on this tidbit.

jerome · August 13, 2021, 3:25pm

We use this [services.concurrency] for making load balancing decisions. We’ll send request to apps with a load of less than their soft limit first.

Once the hard limit is reached, we do not send any more connections to the app. We’ll queue them for a little bit and retry to find a suitable app instance (one that hasn’t reached its hard limit).

anatoli_devops · August 13, 2021, 3:35pm

I don’t see autoscale option in fly.toml. Is it turned on by default?

michael · August 13, 2021, 3:56pm

It’s not enabled by default, but you can configure it with the flyctl autoscale command.

jake · August 15, 2021, 3:37am

Thanks for the clarification!

Do you also do this connection queueing when autoscale is disabled?

Also, I am using the tls and http connection handlers. Web browsers use keep alive – so from my understanding, the fly load balancer deals with all these persistent connections coming in from user’s web browsers, and then forwards the HTTP requests to my app instances? When the concurrent connections from the fly load balancer to an app instance exceeds the soft limit specified in [services.concurrency], then fly will spin up another instance?

In other words, if I have a simple REST HTTP GET call that returns some data, 20 of these calls will have to happen simultaneously (soft_limit of 20) before fly will spin up another instance? Am I understanding this correctly?

Also, does the fly load balancer open persistent connections with my app instances?

jerome · August 16, 2021, 11:51am

@jake you’re mostly understanding correctly however, the reality is:

we can’t scale fast enough (at the moment) for situations where you need 1 more connection to be handle
user connections are indeed persistent, but not connections from us to your app

Our edges will handle a lot more than that, as long as your app handle them fast enough then it’s not a problem.

If you want “requests” concurrency, you can set type = "requests" in the [services.concurrency] block. By default we do connections concurrency which best fits a world of TCP services or websockets.

Again, autoscaling is imperfect, but it’s correct that Fly will only consider scaling your app if it has > soft limit connections for some duration (a few seconds).

jake · August 16, 2021, 1:58pm

Hi @jerome, thanks for the reply! If connections from the fly load balancer to our app is not persistent, and my app is just doing simple HTTP REST handling, won’t the type = "request" have the same behaviour?

Is the measurement done at the app, or is it done at the load balance? E.g. 20 simultaneous connections from the load balancer to the app causes a scale, or 20 simultaneous connections to the load balancer?

Thanks!

jerome · August 16, 2021, 2:01pm

Using type = "requests" makes connections persistent. We only keep them idle for ~2 seconds though because some apps have very aggressive keep alive timeouts and it might conflict. Creating new connections is fast anyway.

At the app level. That’s where the limits are enforced. We use the edge connection numbers to determine which region we should create new instances in.

jake · August 16, 2021, 3:04pm

Thank you!

Topic		Replies	Views
Understanding autoscale stats Questions / Help	2	311	October 24, 2021
Why does the services.concurrency clause when used with connections not autoscale up? autoscaling , proxy	13	57	March 15, 2025
Concurrency connection limits in fly.toml for machines recommended values? Questions / Help machines	2	718	March 24, 2023
autoscale max instances	8	715	October 12, 2021
Autoscale doesn't seem to work with hard_limit = 1 and soft_limit = 1	13	1322	September 7, 2021

services.concurrency for free tier

Related topics