first off, I really really appreciate you guys answering questions in the forum diligently!
fly.toml, is my understanding of the
[services.concurrency] section correct?
- A concurrent request means 1 ongoing TCP request, which usually takes a couple seconds max.
25 concurrent requests, for example, mean 25 requests being handled at the same time (and probably sent roughly at the same time)
- How is this handled with WebSockets (and Long-Polling fallbacks)?
And in terms of concurrent requests:
When autoscaling is enabled, a new VM instance is brought up when the
hard_limitis reached (and very likely to when the
When autoscaling is disabled, requests are being queued when the
hard_limitis reached and served when below. The
All that said, how do I determine good limits without brining down the app in production – or having users camp out overnight, in line for their request to be served?
If my understanding is correct, a…
hard_limittoo high could risk crashing our application due to large amounts of requests (going OOM), especially with sudden bursts of traffic
hard_limittoo low could either spin up way too many VMs or stall requests for a long time
Thanks in advance!