Hi… There’s at least one subtlety: you’ve mixed http_service with services when attempting to refer to the same internal_port, and this tends to confuse the Fly.io infrastructure.
Aside: The bounds that you showed may have just been for testing, but, if not, I’d suggest consulting the “Guidelines for concurrency settings”. In particular…
If the soft and hard limit are too close, then there might not be enough “time” for the proxy to decide to load balance and the result could be multiple retries.
Thanks @mayailurus and @khuezy . fly scale count 2 doesn’t seem to be a toml configuration. How can we configure the app to autoscale when it hits the concurrent TCP connections threshold?
I think what is being said is that if you issue that command on the console, your app will remember this ceiling count for the remainder of its lifetime.
Makes sense @halfer . We do have a standby machine
➜ interview git:(sid/trybasicflyscaling) ✗ fly scale show
VM Resources for app:
Groups
NAME COUNT KIND CPUS MEMORY REGIONS
app 2 performance 8 16384 MB sjc(2)
OK. Well, I don’t know the answer to your question, as the mysterious inner workings of the concurrency device are, well, mysterious.
But it is solvable another way. You could add a simple mechanism to read load metrics from each machine (e.g. via top or uptime). These get sent (or pulled) to another small app that runs the scale command, up or down, based on the prevailing conditions. This is a bit more faff than tweaking a config file, but you’ll have a much better handle of how it all works.
I checked your metrics and your maximum concurrency (requests the machine was serving at a given point) was 2. Your soft_limit is 90 concurrent requests, the second machine won’t spin up until the first one is at the soft limit.
Thanks @roadmr , yeah we’re playing around with those limits. We did get it to scale up after sending more TCP connections, but it seems to be non-deterministic. Note that we’re using type = "connections", not requests. Currently on the instance I see
root@7811096a526708:/app# ss -s
Total: 500
TCP: 396 (estab 97, closed 296, orphaned 0, timewait 296)
Transport Total IP IPv6
RAW 0 0 0
UDP 95 28 67
TCP 100 80 20
INET 195 108 87
FRAG 0 0 0
Which is over both our soft and hard limits of 100 and 90, but it doesn’t scale up
The large discrepancy between the number of connections you mention (80-100) and what I see in the http load / concurrency metric (2 at most) makes me think your connections are not originating from external clients hitting your app via the proxy.
Can you tell us more about your TCP connections? What originates them?
Yeah good point, so we have a python fastAPI server running on the app which spins up py subprocesses. These py subprocesses initiate websocket/webRTC connections with external services such as Cartesia (a text to speech provider) and others. Each process has about 5 active TCP connections as you can see in the ss -s output above. I’m not sure if they would go via the fly proxy, I’d think so?
what I see in the http load / concurrency metric (2 at most)
No, they don’t. The Fly proxy is a load-balancing proxy that only handles incoming connections (in your case, http requests to your app). Outgoing connections originating in your machines only go through a few layers of nftables routing on the way out and don’t in any way influence machine auto-start/stop. Read this for more information:
tl;dr if you hit your app (https://your-app.fly.dev for example) with 100 concurrent connections you should see your second machine start up.
GO to Sign in to Your Account · Fly, hit “Metrics” on the left-side menu, this should take you to Grafana. Select the app you want to see and make sure you’re in the “Fly App” dashboard. You want to look for the “App concurrency” panel.
There’s a newer, heavier-weight mechanism that you can deploy to scale based on more general concepts of load, but you would need to define and report a “current number of subprocesses” metric yourself, etc.
Since you already have a dispatcher in place, I think it would be easier to just modify fastAPI to keep track of the number of running subprocesses internally and then respond with Fly-Replay: elsewhere=true when that count breached the desired threshold…