not an answer but perhaps some pointers:
-
according to this reply you might want to tweak the query in grafana to get a more representative graph of the concurrent requests. I experimented with it in one of my test apps and the difference is considerable. Whether it’s also useful… hard to say
-
soft_limit is “just a hint” for Fly Proxy, the proxy will consider other things as well like, for example, what’s the machine nearest to the request
since you have machines in different regions do you see anything on this front that might help?
Docs here specifically mention that the autostop/start decisions are by region
- AFAIK soft_limit is used by the proxy also as one of several inputs to decide whether the instance has capacity in excess or not to shut it down… so could be a double edge sword (e.g. having soft_limit=100,000 might nudge the proxy to shut the machine down if it’s getting only 20,000 requests and other machines are far from their hard_limit, if any)
Question: have you experimented with removing soft_limits completely to start from a clean baseline? it should default to 20 if I remember correctly