Hi @merlin,
- Your understanding of
services.concurrency
is mostly correct. Worth mentioning thetype
option (which it looks like you already discovered), which defaults toconnections
, but can be set torequests
for thehttp
handler. The app’s concurrency (reported by thefly_app_concurrency
metric and the ‘VM Service Concurrency’ graph on the Metrics tab of the Dashboard) will be based on either connections or requests based on this setting.- WebSockets connections (including long-lived, idle connections) are all included in the concurrency calculation (as are long-polling requests). This does make it less convenient to use autoscaling with applications using tons of long-lived, idle connections that consume few resources. (We’ve considered making the query used by autoscaling configurable in the future- let us know if this would be helpful for your use-case.)
- Beyond not routing requests to instances at the
hard_limit
, the load-balancer also prefers instances under thesoft_limit
if any are available. So even when autoscaling is disabled, thesoft_limit
still acts as a ‘hint’ to help load-balance traffic more evenly. - As a general recommendation for tuning concurrency, I would start by setting a conservatively-high
hard_limit
mostly as a failsafe to prevent OOM from spikes in traffic, and then focus on tuning thesoft_limit
as a tighter bound for optimal scaling and load-balancing decisions. Then you can focus on tweaking thesoft_limit
over time to best fit your workload without worrying too much about bringing down the app from a too-lowhard_limit
blocking requests.