Finding good [services.concurrency] settings without bringing down prod ;-)


first off, I really really appreciate you guys answering questions in the forum diligently!

In fly.toml, is my understanding of the [services.concurrency] section correct?

  • A concurrent request means 1 ongoing TCP request, which usually takes a couple seconds max.
    25 concurrent requests, for example, mean 25 requests being handled at the same time (and probably sent roughly at the same time)
    • How is this handled with WebSockets (and Long-Polling fallbacks)?

And in terms of concurrent requests:

  • When autoscaling is enabled, a new VM instance is brought up when the hard_limit is reached (and very likely to when the soft_limit is reached)

  • When autoscaling is disabled, requests are being queued when the hard_limit is reached and served when below. The soft_limit is unused.

All that said, how do I determine good limits without brining down the app in production – or having users camp out overnight, in line for their request to be served? :camping:

If my understanding is correct, a…

  • hard_limit too high could risk crashing our application due to large amounts of requests (going OOM), especially with sudden bursts of traffic

  • hard_limit too low could either spin up way too many VMs or stall requests for a long time

Thanks in advance!

1 Like