Hi there, everyone! We’re experiencing some weird behaviour around load-balancing/concurrency that I’m hoping somebody has some insight about.
We were noticing that occasionally our Rails app was throttled due to going over CPU balance, causing long queue/response times. But we run several machines, and we noticed that when this happens, most traffic seems to all be hitting a single machine, and only that machine struggles under the weight.
To mitigate, we configured autoscaling via starting/stopping machines, with explicit soft and hard limits:
[http_service]
processes = ["web"] # this service only applies to the web process
http_checks = []
internal_port = 8080
protocol = "tcp"
script_checks = []
force_https = true
auto_stop_machines = "stop"
auto_start_machines = true
min_machines_running = 5
soft_limit = 10
hard_limit = 15
Yet this hasn’t mitigated the issue at all - we’ll still often see a single machine experiencing high concurrency while other machines sit idle:
Furthermore, as can be seen above, it never seems to flag the machine as having hit the hard limit.
I’ve theorized for a while that these spikes may all be requests coming from a single client, as some load balancers will prioritize sending requests from a single client to a single machine. But I was under the impression that the hard limit should prevent even this case - that if concurrency hits the hard limit, the load balancer should prevent any more traffic from being routed to that machine.
It’s clear that I’m misunderstanding something. Can anybody explain why our hard limit doesn’t seem to be respected, and perhaps suggest how we might mitigate our problem?