Hi @merlin,
- Your understanding of
services.concurrencyis mostly correct. Worth mentioning thetypeoption (which it looks like you already discovered), which defaults toconnections, but can be set torequestsfor thehttphandler. The app’s concurrency (reported by thefly_app_concurrencymetric and the ‘VM Service Concurrency’ graph on the Metrics tab of the Dashboard) will be based on either connections or requests based on this setting.- WebSockets connections (including long-lived, idle connections) are all included in the concurrency calculation (as are long-polling requests). This does make it less convenient to use autoscaling with applications using tons of long-lived, idle connections that consume few resources. (We’ve considered making the query used by autoscaling configurable in the future- let us know if this would be helpful for your use-case.)
- Beyond not routing requests to instances at the
hard_limit, the load-balancer also prefers instances under thesoft_limitif any are available. So even when autoscaling is disabled, thesoft_limitstill acts as a ‘hint’ to help load-balance traffic more evenly. - As a general recommendation for tuning concurrency, I would start by setting a conservatively-high
hard_limitmostly as a failsafe to prevent OOM from spikes in traffic, and then focus on tuning thesoft_limitas a tighter bound for optimal scaling and load-balancing decisions. Then you can focus on tweaking thesoft_limitover time to best fit your workload without worrying too much about bringing down the app from a too-lowhard_limitblocking requests.