Why does the services.concurrency clause when used with connections not autoscale up?

Just to complement @roadmr’s answer…

There’s a newer, heavier-weight mechanism that you can deploy to scale based on more general concepts of load, but you would need to define and report a “current number of subprocesses” metric yourself, etc.

Since you already have a dispatcher in place, I think it would be easier to just modify fastAPI to keep track of the number of running subprocesses internally and then respond with Fly-Replay: elsewhere=true when that count breached the desired threshold…