Recommended concurrency settings with <=5 concurrent requests for autoscaling?

Since the introduction of CPU quotas, I’m trying to keep the CPU usage of my instances below the baseline by tweaking the concurrency settings and make use of autoscaling whenever traffic goes up and down (e.g. there’s more traffic during the day than at night).

According to the docs and several other discussions on this forum, I should set soft_limit by observing the App Concurrency metric on Grafana. I’ve a hard time understanding what I should set as soft_limit when the concurrency is never more than 5 with normal traffic on my app. That seems very low and I’m assuming that my app is handling requests very fast (which is a good thing I guess) so there are almost no concurrent requests happening.

So should I simply set soft_limit to 5 if I want to autoscale instances? E.g. I have created 4 instances in 2 regions, and I want to only run as many instances as needed whenever soft_limit is below 5. Whenever it gets to 5 or higher, instances should start automatically to balance out the CPU usage across instances.

Hope I’ve explained it well enough and thanks for your help!

References:

[[vm]]
  cpu_kind = "shared"
  cpus = 2
  memory_mb = 512

[http_service]
  internal_port = 3000
  force_https = true
  auto_stop_machines = "stop"
  auto_start_machines = true
  min_machines_running = 0
  processes = ["app"]

[http_service.concurrency]
    type = "requests"
    hard_limit = 100
    soft_limit = 5

[[http_service.checks]]
  grace_period = "10s"
  interval = "15s"
  method = "GET"
  timeout = "5s"
  path = "/ping"

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.