Since the introduction of CPU quotas, I’m trying to keep the CPU usage of my instances below the baseline by tweaking the concurrency settings and make use of autoscaling whenever traffic goes up and down (e.g. there’s more traffic during the day than at night).
According to the docs and several other discussions on this forum, I should set soft_limit
by observing the App Concurrency
metric on Grafana. I’ve a hard time understanding what I should set as soft_limit
when the concurrency is never more than 5 with normal traffic on my app. That seems very low and I’m assuming that my app is handling requests very fast (which is a good thing I guess) so there are almost no concurrent requests happening.
So should I simply set soft_limit
to 5
if I want to autoscale instances? E.g. I have created 4 instances in 2 regions, and I want to only run as many instances as needed whenever soft_limit
is below 5. Whenever it gets to 5 or higher, instances should start automatically to balance out the CPU usage across instances.
Hope I’ve explained it well enough and thanks for your help!
References:
[[vm]]
cpu_kind = "shared"
cpus = 2
memory_mb = 512
[http_service]
internal_port = 3000
force_https = true
auto_stop_machines = "stop"
auto_start_machines = true
min_machines_running = 0
processes = ["app"]
[http_service.concurrency]
type = "requests"
hard_limit = 100
soft_limit = 5
[[http_service.checks]]
grace_period = "10s"
interval = "15s"
method = "GET"
timeout = "5s"
path = "/ping"