Hi,
first off, I really really appreciate you guys answering questions in the forum diligently!
In fly.toml
, is my understanding of the [services.concurrency]
section correct?
- A concurrent request means 1 ongoing TCP request, which usually takes a couple seconds max.
25 concurrent requests, for example, mean 25 requests being handled at the same time (and probably sent roughly at the same time)- How is this handled with WebSockets (and Long-Polling fallbacks)?
And in terms of concurrent requests:
-
When autoscaling is enabled, a new VM instance is brought up when the
hard_limit
is reached (and very likely to when thesoft_limit
is reached) -
When autoscaling is disabled, requests are being queued when the
hard_limit
is reached and served when below. Thesoft_limit
is unused.
All that said, how do I determine good limits without brining down the app in production – or having users camp out overnight, in line for their request to be served?
If my understanding is correct, a…
-
hard_limit
too high could risk crashing our application due to large amounts of requests (going OOM), especially with sudden bursts of traffic -
hard_limit
too low could either spin up way too many VMs or stall requests for a long time
Thanks in advance!