Autoscaling is constantly stopping and starting instances even with absurdly high soft_limit of 100k

not an answer but perhaps some pointers:

  1. according to this reply you might want to tweak the query in grafana to get a more representative graph of the concurrent requests. I experimented with it in one of my test apps and the difference is considerable. Whether it’s also useful… hard to say :smiley:

  2. soft_limit is “just a hint” for Fly Proxy, the proxy will consider other things as well like, for example, what’s the machine nearest to the request :point_right: since you have machines in different regions do you see anything on this front that might help?

Docs here specifically mention that the autostop/start decisions are by region

  1. AFAIK soft_limit is used by the proxy also as one of several inputs to decide whether the instance has capacity in excess or not to shut it down… so could be a double edge sword (e.g. having soft_limit=100,000 might nudge the proxy to shut the machine down if it’s getting only 20,000 requests and other machines are far from their hard_limit, if any)

Question: have you experimented with removing soft_limits completely to start from a clean baseline? it should default to 20 if I remember correctly