Region-Specific soft_limit and Cross-Region Load Balancing in the US

Papercut · May 7, 2025, 10:04am

We’re running our app on Fly.io and have been seeing critical performance issues specifically for US-based users — page loads of 30–45 seconds and frequent timeouts. Performance is fine in other regions (1s average).

What We’ve Done So Far:

Deployed 5 machines in different US regions (Boston, LA, Dallas).
Enabled auto_start and auto_stop.
Confirmed that traffic is hitting all machines.
Current soft_limit is 200 and hard_limit is 250.

We’re now trying to:

Lower the soft_limit just for the Boston region, since that machine tends to max out CPU first and trigger problems. However, we can’t find a way to configure soft limits on a per-machine or per-region basis — only globally via fly.toml. Is this currently possible?

Here’s a snippet from our config:

[http_service.concurrency]
type = “requests”
hard_limit = 250
soft_limit = 200

Understand how cross-region load balancing works. From what we’re observing (and based on Fly.io’s docs), even if Boston hits its soft limit, traffic won’t shift to LA or Dallas — even though they’re underutilized. Is that expected behavior?

Key Questions:

Is there a way to configure soft_limit/hard_limit per region or per machine?
Is regional traffic isolation hardcoded into Fly.io’s routing? If so, is there a best practice for fallback or failover across US regions?
Should we be managing this ourselves at the app level (e.g. DNS or proxy layer) to avoid single-region saturation?

We’re happy to temporarily keep machines hot (always-on) in key regions, but want to be sure we’re setting things up for long-term stability — especially in the US market where most of our traffic comes from.

Any guidance or best practices would be really appreciated!

Thanks in advance!

lillian · May 7, 2025, 11:09am

it’s not possible to configure a different soft/hard limit per machine at the moment.
if there is a machine in the closest region that’s under hard limit (even if it’s above soft limit), requests will not be routed to a different region. however, if all machines in a region are at hard limit or failing health checks, requests will be routed to other regions.

if your machines cannot handle 200 requests without maxing out on CPU, you should set a lower hard_limit. if your Boston users just make more expensive queries than users in other regions, I’m not sure we have a good solution for that maybe someone else has a solution?

lowering the soft_limit and adding more machines in the Boston region might work for you. if there are multiple machines in the same region, requests will prefer machines under soft_limit.

roadmr · May 7, 2025, 1:18pm

Soft and hard limits for Fly Machines don’t have to be a black art. Here’s a quick rundown:

hard_limit: max concurrent requests the machine will ever handle.
soft_limit: how many it can handle comfortably, before Fly begins starting more machines. Incoming requests are distributed on a round-robin fashion to started machines.

The best way to dial this in is to actually test it. Based on the target app you’re working to scale, create a separate Fly app copy with the same machine config. Now set both limits absurdly high (say, 1000), and hammer it with load using something like wrk, hey, ab, Locust, or k6. Watch CPU, memory, and latency.

What you’re looking for is the point where responses start to slow down below your comfort threshold. That’s your hard limit—just below that threshold. Your comfort level is up to you; a real-time application will want to keep responses under 100ms, but an ordinary CRUD application might be just fine with request times around 250ms.

Now set your soft limit. The soft limit is there to give our proxy time to bring another machine online before your users notice lag. Example: if things start dragging at 20 concurrent requests, set hard_limit: 18 and soft_limit: 12.

Note that as Lillian said, soft-limit scaling the way I described it here only works in same region- so if you want load to be distributed to machines in other regions, the value that matters is the hard limit. Still, these tips on how to find and tune the hard limit still appy.

Papercut · May 12, 2025, 12:58pm

Our team have tried absolutely every setting when it comes to hard and soft limits, but STILL our USA based machines are not being load balanced, causing our App to move from page to page at 3 - 4 seconds per load.

We literally have empty machines set up that are not being used even when our primary machines are overloaded.

Any help would be appreciated, as the entire team cannot find the solution. They have also noted that the way that Fly.io load balances is essentially a black box that we have no control over.

lillian · May 12, 2025, 1:01pm

could you share the app name so we can take a look?

Papercut · May 12, 2025, 1:38pm

I can’t even login to Fly.io at the moment: Awesome Screenshot

lillian · May 12, 2025, 2:50pm

ah, I see you have a Launch plan, I’d recommend writing to Premium Support (you can find the address in the Support page of the dashboard). they can help figure out the issue better

Papercut · May 12, 2025, 3:01pm

Unfortunately they have not found the solution either.

system · May 19, 2025, 3:01pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Load balancing with the concurrency soft limit parameter	1	633	February 23, 2022
Autoscaling auto_stop_machines not working appsv2	10	862	October 16, 2023
Autoscale doesn't seem to work with hard_limit = 1 and soft_limit = 1	13	1325	September 7, 2021
Looking for recommendations for autoscaling distribution on softlimit	3	319	November 19, 2022
Load balancing within a region	1	870	July 12, 2022

Region-Specific soft_limit and Cross-Region Load Balancing in the US

Related topics