Finding good [services.concurrency] settings without bringing down prod ;-)

wjordan · August 3, 2022, 5:17pm

Hi @merlin,

Your understanding of services.concurrency is mostly correct. Worth mentioning the type option (which it looks like you already discovered), which defaults to connections, but can be set to requests for the http handler. The app’s concurrency (reported by the fly_app_concurrency metric and the ‘VM Service Concurrency’ graph on the Metrics tab of the Dashboard) will be based on either connections or requests based on this setting.
- WebSockets connections (including long-lived, idle connections) are all included in the concurrency calculation (as are long-polling requests). This does make it less convenient to use autoscaling with applications using tons of long-lived, idle connections that consume few resources. (We’ve considered making the query used by autoscaling configurable in the future- let us know if this would be helpful for your use-case.)
Beyond not routing requests to instances at the hard_limit, the load-balancer also prefers instances under the soft_limit if any are available. So even when autoscaling is disabled, the soft_limit still acts as a ‘hint’ to help load-balance traffic more evenly.
As a general recommendation for tuning concurrency, I would start by setting a conservatively-high hard_limit mostly as a failsafe to prevent OOM from spikes in traffic, and then focus on tuning the soft_limit as a tighter bound for optimal scaling and load-balancing decisions. Then you can focus on tweaking the soft_limit over time to best fit your workload without worrying too much about bringing down the app from a too-low hard_limit blocking requests.