Autoscaling: is there a way to see how many instances are running in each region?

Most of the requests are currently hitting EWR and AMS as far as I can tell.

I don’t think the type = "requests" helped in your case. Seems like it’s a not connection issue. The instances appear to get overwhelmed.

I think your soft and hard limit are maybe problematic. They’re so high right now that our proxy is likely never going to balance between your instances. If your soft limit is never or rarely hit, the proxy just picks the closest instance. So it seems to swarm 1 instance until it reaches that point and then sends it all to another. Just a theory, maybe something else is at play here. Balancing might also work better if you used less regions (we’ll randomly balance between the closest instances, but if you only have 1 in a region, it might get picked a lot more).

Since your requests are very short-lived, it’s a bit hard for our “loads” state to replicate in time. We should probably adjust how we do that to better fit your kind of app.

We’re not running Docker, every app runs in a firecracker microvm. Meaning you have full control over the limits inside your VM.

We currently set the rlimit (what ulimit sets) to 10240 at boot, within the VM. You can change that value with an ENTRYPOINT in your Docker image. The current value was just better than the much lower default value.