Your concurrency settings are
1000, 4000. Our proxy only knows if your app’s load is either between
1000-4000 or exactly
This means, when concurrency is under
1000 for all instances, we’ll send it to the closest instance (or if there are more than 1 instance in the same region, we’ll randomly balance between them).
In your second screenshot, the load is better distributed because we know there are instances that should take some of the load, even if they’re further away.
In your first screenshot, 1384 is indeed over 1000, but that’s a peak and sometimes the load numbers aren’t distributed fast enough for the proxy to react. Bue you see right after the peak there’s a lower concurrency value and another instance has taken more load (the lighter green one).
You could tweak our soft and hard limits if you want us to balance between instances when traffic is lower. This might mean we’ll balance to further-away regions.
We did make some changes on how we measure the closeness of instances and that’s maybe why load is not well distributed between instances in the same datacenter. I can look into that. It’s possible the same client is hitting the same server over and over again, this would make an instance on the same server more likely to get the load.