Just to complement @roadmr’s answer…
There’s a newer, heavier-weight mechanism that you can deploy to scale based on more general concepts of load, but you would need to define and report a “current number of subprocesses” metric yourself, etc.
Since you already have a dispatcher in place, I think it would be easier to just modify fastAPI to keep track of the number of running subprocesses internally and then respond with Fly-Replay: elsewhere=true
when that count breached the desired threshold…