⚙️ 3–8 s upstream header wait between Fly gateway and API (same FRA region, both warm)

Hello Fly.io team,

my name is Lars, I’m an independent developer currently building a modular analytics app (Trend Analyzer) hosted entirely on Fly.io. I really enjoy the platform and have been impressed by how flexible Machines and regional setups are. However, I’ve run into one recurring networking issue that seems to sit below the app level.

Summary:
I’m seeing consistent 3–8 second upstream header wait times between my Fly gateway and API apps, both deployed in region FRA.

Apps:

  • Gateway: shopify-trend-gateway
  • API: shopify-trend-api
    Both apps use auto_start_machines = true and stop_delay = 1800 to avoid cold starts. Machines are confirmed warm and responsive when hit directly.

Observations:
API (direct):
Server-Timing: rows_load≈2–3 ms, serialize≈0 ms, render≈0.2 ms, dep≈0 ms
Total API response time: < 10 ms

Gateway (proxying to API):
/shopify/ui_api/snapshots/latest_detail → X-GW-Upstream-Header-Time: 8.145 s | X-GW-Upstream-Connect-Time: 0.003 s | X-GW-Request-Time: 8.145 s
/shopify/ui_api/snapshots/list → X-GW-Upstream-Header-Time: 3.523 s | X-GW-Upstream-Connect-Time: 0.004 s | X-GW-Request-Time: 3.522 s
/shopify/ui_api/snapshots/detail → X-GW-Upstream-Header-Time: 4.720 s | X-GW-Upstream-Connect-Time: 0.004 s | X-GW-Request-Time: 4.721 s

Health checks:
Gateway /health/live: 0.118 s total
API /health/live: 0.062 s total

Result:
Connect times are instant (~3 ms)
Delay sits entirely in upstream header wait (3–8 s)
App processing negligible (< 5 ms)

Config context:
Gateway (shopify-trend-gateway/fly.toml)
[http_service]
internal_port = 8080
auto_stop_machines = true
auto_start_machines = true
min_machines_running = 0
stop_delay = 1800

API (shopify-trend-api/fly.toml)
[http_service]
internal_port = 8080
auto_stop_machines = true
auto_start_machines = true
min_machines_running = 1
stop_delay = 1800

Analysis:

  • No cold-starts — Machines are warm and running.
  • TCP/TLS connect < 10 ms → no handshake issue.
  • API Server-Timing shows all work completes in < 5 ms.
  • Gateway spends 3–8 s waiting for the first byte from the API.
  • Looks like infrastructure latency within Fly’s internal network (gateway→API path in FRA).

Request:
Could you please check whether there are known latency issues or routing delays between Machines in the same FRA region? It seems the packets are held before the API responds, despite the container being warm.

Happy to provide full raw timing headers or more context if needed. Thanks for taking a look at this.

Best,
Lars

Hi @lars1234 :waving_hand:

How exactly is shopify-trend-gateway accessing shopify-trend-api? Is it through an .internal address, or is it through a public IP / domain assigned to the shopify-trend-api app?

(By the way, if it is the former than auto_start_machines wouldn’t work properly; it’s probably unrelated to the issue you’re seeing, so just a side note)

Hi Peter, thanks for the quick reply!

Currently the gateway accesses the API via its public domain:
https://shopify-trend-api.fly.dev

Both apps are deployed in the same FRA region, but the gateway uses the public HTTPS address in its NGINX proxy_pass configuration.

Would switching to the internal address (http://shopify-trend-api.internal:8080) avoid the upstream header wait we’re seeing between the two apps? If so, is there anything special I need to configure on the gateway side (e.g. IPv6-only DNS resolution or region constraints), or does Fly handle this automatically?

Thanks again — really appreciate the help!

If this is a intra-regional network issue, probably not, and in fact as mentioned above, auto_start_machines will not work with .internal domains.

I’ll look into the latency issue.

Just to clarify: are you currently seeing 3 - 8s upstream header latency? I was able to make a request to your shopify-trend-api.fly.dev domain from the physical host running the gateway app, and got a response almost immediately. Of course I’m just sending a GET to the bare domain, so it probably isn’t hitting any real application logic here.

Thanks a lot, Peter!

Yes, that’s correct — the upstream header latency (3–8s) occurs on requests like
/shopify/ui_api/snapshots/list or /shopify/ui_api/snapshots/detail
when called through the gateway.

Direct requests to shopify-trend-api.fly.dev respond in a few milliseconds,
so the delay only happens between the gateway and API inside the same FRA region.

Good to know about .internal and auto_start_machines — that makes sense.
Appreciate you taking a deeper look into the latency path, let me know if I can
help by providing more headers or timings.

Thanks again!

Thanks, Peter!

Yes — the 3–8 s upstream header delay only appears on actual application routes (for example the snapshot endpoints), not on root or /health.

Here’s what we consistently observe:

• Request path (through gateway): /shopify/ui_api/snapshots/list
X-GW-Upstream-Header-Time: 3.4 – 5.1 s
X-GW-Request-Time: same range
Server-Timing (API): rows_load ≈ 2 ms, render ≈ 0.2 ms, dep ≈ 0 ms

• Request path (through gateway): /shopify/ui_api/snapshots/detail
X-GW-Upstream-Header-Time: 4.7 – 5.2 s
X-GW-Request-Time: same range
Server-Timing (API): rows_load ≈ 3 ms, render ≈ 0.1 ms, dep ≈ 0 ms

• Same requests sent directly to https://shopify-trend-api.fly.dev
→ total time < 10 ms

So the delay is only visible when the gateway forwards the request to the API within FRA.
Both apps are in the same region and warm (stop_delay = 1800, min_machines_running = 1).
It looks like the gateway spends several seconds waiting for the first byte from the upstream.

Happy to share the full response headers or run a controlled test if that helps.

Do you have a test request I could use against your endpoint that will actually hit application logic? Hitting those paths with a simple GET doesn’t yield significant latency either as it seems.

(Also, if this contains information you can’t share on a public forum – feel free to share it over email at peter at fly dot io as well)

Thanks, Peter — I’ve just sent you the test details and endpoint URL via email.
Appreciate your help looking into this!

It seems like I’m only able to reproduce the latency very occasionally – is that the same case for you? Basically in the last 30 minutes or so I have only been able to observe high latency once or twice when accessing the test endpoint you sent but replacing the domain with the one for gateway.

This smells like some kind of DNS problem but since you also indicated that TCP connect doesn’t seem to be slow, I’m really not sure what’s going on here. Requesting the test endpoint directly from the physical host running the gateway machine is also fast, plus that accessing the root URL seems quick always, I don’t really think this is a networking issue.

Hi Peter,

thanks a lot for your quick and detailed investigation — that really helped us narrow things down.

After reviewing the data again, it seems you’re right: the latency doesn’t appear to be consistent, and under some conditions it disappears entirely. That makes us think we may need to take a closer look at our own application path — especially around database connection pooling or potential blocking I/O inside the FastAPI app.

Your observations (that the gateway-to-API path itself looks fine most of the time, and that direct host requests respond instantly) were extremely helpful in steering us in the right direction. We’ll continue testing on our side and circle back if we find something reproducible that still points to the Fly infrastructure layer.

Thanks again for the quick turnaround and the friendly, uncomplicated help — really appreciate it!

Best regards,
Lars

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.