Why does the services.concurrency clause when used with connections not autoscale up?

Siddhartha90 · March 7, 2025, 6:07am

Here’s the doc we followed - App configuration (fly.toml) · Fly Docs

This is what we have:

[http_service]
  internal_port = 7860
  force_https = true
  auto_stop_machines = 'stop'
  auto_start_machines = true
  min_machines_running = 1
  processes = [ 'app' ]

[services.concurrency]
  type = "connections"
  hard_limit = 3
  soft_limit = 2

I created a few TCP connections on the box spun up and confirmed via

root@2867356ae2d0d8:/app# ss -s
Total: 41
TCP:   25 (estab 7, closed 16, orphaned 0, timewait 16)

Transport Total     IP        IPv6
RAW	  0         0         0        
UDP	  3         1         2        
TCP	  9         6         3        
INET	  12        7         5        
FRAG	  0         0         0

Despite 9 active TCP connections which is beyond the hard limit, it does not spin up additional machines. What might we be missing?

mayailurus · March 7, 2025, 7:48am

Hi… There’s at least one subtlety: you’ve mixed http_service with services when attempting to refer to the same internal_port, and this tends to confuse the Fly.io infrastructure.

Try the [http_service.concurrency] variant instead.

(Ideally, flyctl itself would warn about this.)

Aside: The bounds that you showed may have just been for testing, but, if not, I’d suggest consulting the “Guidelines for concurrency settings”. In particular…

If the soft and hard limit are too close, then there might not be enough “time” for the proxy to decide to load balance and the result could be multiple retries.

khuezy · March 7, 2025, 1:20pm

Also: fly won’t automatically create new machines, it will only scale up to the maximum machines you’ve configured.

Ie, you need to first use “fly scale count 2”. Then fly will auto scale to 2 machines.

Siddhartha90 · March 7, 2025, 10:04pm

Thanks @mayailurus and @khuezy . fly scale count 2 doesn’t seem to be a toml configuration. How can we configure the app to autoscale when it hits the concurrent TCP connections threshold?

halfer · March 7, 2025, 10:20pm

I think what is being said is that if you issue that command on the console, your app will remember this ceiling count for the remainder of its lifetime.

Siddhartha90 · March 7, 2025, 10:27pm

Makes sense @halfer . We do have a standby machine

➜  interview git:(sid/trybasicflyscaling) ✗ fly scale show
VM Resources for app:

Groups
NAME	COUNT	KIND       	CPUS	MEMORY  	REGIONS 
app 	2    	performance	8   	16384 MB	sjc(2)

However, this never gets spun up despite exceeding the threshold

halfer · March 7, 2025, 11:35pm

OK. Well, I don’t know the answer to your question, as the mysterious inner workings of the concurrency device are, well, mysterious.

But it is solvable another way. You could add a simple mechanism to read load metrics from each machine (e.g. via top or uptime). These get sent (or pulled) to another small app that runs the scale command, up or down, based on the prevailing conditions. This is a bit more faff than tweaking a config file, but you’ll have a much better handle of how it all works.

roadmr · March 7, 2025, 11:50pm

I checked your metrics and your maximum concurrency (requests the machine was serving at a given point) was 2. Your soft_limit is 90 concurrent requests, the second machine won’t spin up until the first one is at the soft limit.

Regards,

Daniel

Siddhartha90 · March 7, 2025, 11:53pm

Thanks @roadmr , yeah we’re playing around with those limits. We did get it to scale up after sending more TCP connections, but it seems to be non-deterministic. Note that we’re using type = "connections", not requests. Currently on the instance I see

root@7811096a526708:/app# ss -s
Total: 500
TCP:   396 (estab 97, closed 296, orphaned 0, timewait 296)

Transport Total     IP        IPv6
RAW	  0         0         0        
UDP	  95        28        67       
TCP	  100       80        20       
INET	  195       108       87       
FRAG	  0         0         0

Which is over both our soft and hard limits of 100 and 90, but it doesn’t scale up

What is the correlation between TCP connections on the box and the setting?

roadmr · March 8, 2025, 1:29am

The large discrepancy between the number of connections you mention (80-100) and what I see in the http load / concurrency metric (2 at most) makes me think your connections are not originating from external clients hitting your app via the proxy.

Can you tell us more about your TCP connections? What originates them?

Daniel

Siddhartha90 · March 8, 2025, 1:57am

Yeah good point, so we have a python fastAPI server running on the app which spins up py subprocesses. These py subprocesses initiate websocket/webRTC connections with external services such as Cartesia (a text to speech provider) and others. Each process has about 5 active TCP connections as you can see in the ss -s output above. I’m not sure if they would go via the fly proxy, I’d think so?

what I see in the http load / concurrency metric (2 at most)

Can you tell me where I can examine these myself?

roadmr · March 8, 2025, 3:05am

No, they don’t. The Fly proxy is a load-balancing proxy that only handles incoming connections (in your case, http requests to your app). Outgoing connections originating in your machines only go through a few layers of nftables routing on the way out and don’t in any way influence machine auto-start/stop. Read this for more information:

tl;dr if you hit your app (https://your-app.fly.dev for example) with 100 concurrent connections you should see your second machine start up.

GO to Sign in to Your Account · Fly, hit “Metrics” on the left-side menu, this should take you to Grafana. Select the app you want to see and make sure you’re in the “Fly App” dashboard. You want to look for the “App concurrency” panel.

Cheers!

mayailurus · March 8, 2025, 3:19am

Just to complement @roadmr’s answer…

There’s a newer, heavier-weight mechanism that you can deploy to scale based on more general concepts of load, but you would need to define and report a “current number of subprocesses” metric yourself, etc.

Since you already have a dispatcher in place, I think it would be easier to just modify fastAPI to keep track of the number of running subprocesses internally and then respond with Fly-Replay: elsewhere=true when that count breached the desired threshold…

system · March 15, 2025, 3:20am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Issue with Autoscaling Based on Request Count in Fly.io autoscaling , proxy	5	73	October 27, 2024
services.concurrency for free tier Questions / Help	9	2725	August 16, 2021
Autoscale doesn't seem to work with hard_limit = 1 and soft_limit = 1	13	1322	September 7, 2021
Fly Not Scaling?	2	398	February 3, 2021
Concurrency connection limits in fly.toml for machines recommended values? Questions / Help machines	2	718	March 24, 2023

Why does the services.concurrency clause when used with connections not autoscale up?

Related topics