Help for understanding services concurrency and load balancing

Hello everybody,
Here are some screenshots showing that one instance had its services concurrency level very high and it suddenly dropped. Both before and after we did not noticed any change in the performance while also having no change in our usage, as you can see in the other metrics.

So we are wondering why it was so high for one instance?
And what is the concurrency exactly?

Suddenly dropping concurrency:

Data and VM metrics:

Response time:

DNS level request metrics:

Longer post/Providing some context:
We are using for production since a few months after a staging phase that didn’t had much load.
In production, we experienced a huge bottleneck though our Phoenix app is currently only a bare webpage. So currently our application is more of a web server.

Before understanding what was going on, we first scaled without thinking to much and simply scaled to like 20 instances (this is another topic but the scaling and autoscaling is a really frustrating experience - maybe I should post a “help me understand scaling and autoscaling”?)

Anyway, we then noticed that the default concurrency of 25 was the problem. We then bump that a lot more, up to 100 and even 500. After that we ended up with somewhat an acceptable production server (considering it’s only handling web pages - though they are LiveViews, does this change something?)

After that we also noticed that 20 instances was way overkill :sweat_smile: so we changed back that down to 3 since we wanted to have one instance running in the east cost (ewr), one in the west cost (lax) and one in Europe (lhr).

Since the start of that new config, I noticed that one instance had constantly a high concurrency level while the other have a low level. Like 200+ for ewr and 5-10 for lax and lhr.

We thought that it’s just a poor balancing and as long as the hard limit is not reached it’s fine. The metric that matter for us was the response time and it was stable.

Until today I noticed that without having any differences in our requests, the ewr instance concurrency dropped significantly but without really having a performance difference.

So I wanted to understand what is the services.concurrency and how it is handled?
How it’s impacting the load balancing, which BTW we never really noticed triggering.

Thank you very much for any details and docs that might help us better provision our instances without impacting our performance.

Edit: timestamp are in UTC+2, may 17th.