We’re running our app on Fly.io and have been seeing critical performance issues specifically for US-based users — page loads of 30–45 seconds and frequent timeouts. Performance is fine in other regions (1s average).
What We’ve Done So Far:
Deployed 5 machines in different US regions (Boston, LA, Dallas).
Enabled auto_start and auto_stop.
Confirmed that traffic is hitting all machines.
Current soft_limit is 200 and hard_limit is 250.
We’re now trying to:
Lower the soft_limit just for the Boston region, since that machine tends to max out CPU first and trigger problems. However, we can’t find a way to configure soft limits on a per-machine or per-region basis — only globally via fly.toml. Is this currently possible?
Here’s a snippet from our config:
[http_service.concurrency]
type = “requests”
hard_limit = 250
soft_limit = 200
Understand how cross-region load balancing works. From what we’re observing (and based on Fly.io’s docs), even if Boston hits its soft limit, traffic won’t shift to LA or Dallas — even though they’re underutilized. Is that expected behavior?
Key Questions:
Is there a way to configure soft_limit/hard_limit per region or per machine?
Is regional traffic isolation hardcoded into Fly.io’s routing? If so, is there a best practice for fallback or failover across US regions?
Should we be managing this ourselves at the app level (e.g. DNS or proxy layer) to avoid single-region saturation?
We’re happy to temporarily keep machines hot (always-on) in key regions, but want to be sure we’re setting things up for long-term stability — especially in the US market where most of our traffic comes from.
Any guidance or best practices would be really appreciated!
it’s not possible to configure a different soft/hard limit per machine at the moment.
if there is a machine in the closest region that’s under hard limit (even if it’s above soft limit), requests will not be routed to a different region. however, if all machines in a region are at hard limit or failing health checks, requests will be routed to other regions.
if your machines cannot handle 200 requests without maxing out on CPU, you should set a lower hard_limit. if your Boston users just make more expensive queries than users in other regions, I’m not sure we have a good solution for that maybe someone else has a solution?
lowering the soft_limit and adding more machines in the Boston region might work for you. if there are multiple machines in the same region, requests will prefer machines under soft_limit.
Soft and hard limits for Fly Machines don’t have to be a black art. Here’s a quick rundown:
hard_limit: max concurrent requests the machine will ever handle.
soft_limit: how many it can handle comfortably, before Fly begins starting more machines. Incoming requests are distributed on a round-robin fashion to started machines.
The best way to dial this in is to actually test it. Based on the target app you’re working to scale, create a separate Fly app copy with the same machine config. Now set both limits absurdly high (say, 1000), and hammer it with load using something like wrk, hey, ab, Locust, or k6. Watch CPU, memory, and latency.
What you’re looking for is the point where responses start to slow down below your comfort threshold. That’s your hard limit—just below that threshold. Your comfort level is up to you; a real-time application will want to keep responses under 100ms, but an ordinary CRUD application might be just fine with request times around 250ms.
Now set your soft limit. The soft limit is there to give our proxy time to bring another machine online before your users notice lag. Example: if things start dragging at 20 concurrent requests, set hard_limit: 18 and soft_limit: 12.
Note that as Lillian said, soft-limit scaling the way I described it here only works in same region- so if you want load to be distributed to machines in other regions, the value that matters is the hard limit. Still, these tips on how to find and tune the hard limit still appy.
Our team have tried absolutely every setting when it comes to hard and soft limits, but STILL our USA based machines are not being load balanced, causing our App to move from page to page at 3 - 4 seconds per load.
We literally have empty machines set up that are not being used even when our primary machines are overloaded.
Any help would be appreciated, as the entire team cannot find the solution. They have also noted that the way that Fly.io load balances is essentially a black box that we have no control over.
ah, I see you have a Launch plan, I’d recommend writing to Premium Support (you can find the address in the Support page of the dashboard). they can help figure out the issue better