Looking for recommendations for autoscaling distribution on softlimit

Just ported a small application today and I’m trying to tune it so that the users are always hitting a fast server. This may be a tweener edge case, but I was wondering if anyone has dealt with this.

Background

  • there are between 70-130 active connections at any time.
  • the minimum memory required is 512MB
  • the softlimit for 512MB is 45 connections, but if the server was sized to 1GB, it would be 100+.
  • the current autoscaling is set to balanced min=3 max=8
  • the current active regions are sea(20), mia(45), and ams(20).

The issue is that between 3 servers, we are getting 45, 20, 20 connections. When a single server hits the soft-limit, the requests are routed to the alternate servers which adds roughly 100ms to each request. And there will be no autoscaling until all 3 servers would need to hit 45 concurrent connections.

I’m not really clear what to do in this case. The options I see are:

  1. Lower the soft-limit. We would still route any excess soft-limit to other servers, but at least it would autoscale into the best region quicker.
  2. Set a higher minimum count for all regions. When the soft-limit is hit, it would re-route to a closer server.
  3. Increase the memory to 1GB which is overprovisioned, but at least no re-routing. Also, then there would be no autoscaling ever.
  4. Is there a hidden option to autoscale when a single server hits the soft-limit rather than the entire set of servers hitting the soft-limit?

Anyone have any recommendations?