Fly.io Proxy adds ~11 seconds Content Download delay to small JSON responses, app responds in 11 milliseconds

Small JSON responses (~24KB uncompressed, ~1KB after proxy zstd compression) from our SvelteKit app intermittently take 10-12 seconds to reach the browser, despite the app completing the request in under 11ms. The delay is entirely in the Content Download phase. TTFB is normal. Direct curl to localhost confirms the app is fast. Reproducible on every new browser session.

This is not the “upstream header wait” issue reported in other threads: TTFB is fine. The proxy receives the complete response promptly but holds the body for ~11 seconds before delivering it.

Setup

  • Single machine, Frankfurt region (fra)

  • SvelteKit with adapter-node on port 3000

  • Node.js 22, KEEP_ALIVE_TIMEOUT=65, HEADERS_TIMEOUT=70

  • swap_size_mb = 512, auto_stop_machines = "off"

  • Persistent volume mounted at /data

  • Litestream running for SQLite replication

Reproduction

  1. Open a new incognito window

  2. Navigate to the app (initial page load is fast)

  3. Click any internal link (client-side navigation triggers __data.json fetch)

  4. The __data.json fetch takes 10-12 seconds

  5. Subsequent navigations are fast

  6. Repeatable on every new session

When testing with 4 separate incognito windows simultaneously, all 4 navigations unblock at the exact same instant, regardless of when each was initiated. This suggests a single blocking event in the proxy layer, not per-connection latency.

Evidence

1. Direct localhost test (via fly ssh console):

fly ssh console -a XXXXXXX -C "curl -s -o /dev/null -w \
  'TTFB: %{time_starttransfer}s\nTotal: %{time_total}s\nSize: %{size_download} bytes\n' \
  'http://localhost:3000/book/__data.json?x-sveltekit-invalidated=01'"

TTFB: 0.010920s
Total: 0.011045s
Size: 24484 bytes

The app responds in 11ms with a 24KB response body when bypassing the proxy. Consistently 8-11ms across repeated tests.

2. Same request through the Fly.io proxy (Chrome DevTools → Timing):

Phase Duration
Queueing 1.35ms
Stalled 0.60ms
Request sent 0.17ms
Waiting for server response (TTFB) 154ms
Content Download 10.94s
Total 11.09s

3. Server-Timing header confirms fast server-side processing:

server-timing: total;dur=7.9, hooks;dur=0.1, resolve;dur=7.7

The app processed the entire request in 7.9ms. The proxy received the complete response almost immediately (TTFB = 154ms including round-trip), then took 10.94 seconds to deliver the ~1KB compressed body to the browser.

4. Response headers from the slow request:

cache-control: private, no-store
content-encoding: zstd
content-type: application/json
date: Thu, 09 Apr 2026 22:51:08 GMT
fly-request-id: 01KNT7BCQZ192R58K9NQECAFD8-fra
server: Fly/9f41eaedb (2026-04-09)
server-timing: total;dur=7.9, hooks;dur=0.1, resolve;dur=7.7
via: 2 fly.io

Note: content-encoding: zstd is applied by the proxy — the app does not set this header. The proxy compresses the 24KB response to ~1KB, then takes 11 seconds to deliver that 1KB.

5. Server-side logs confirm no blocking:

We added a 500ms-interval event loop lag detector — it never fires during the slow requests. We also log every request with timing. All complete in 2-55ms. During the delay, there are 8-12 second gaps where no requests arrive at the app, despite the user actively clicking. After each gap, a burst of requests appears simultaneously.

What we ruled out on the application side:

Potential cause Status Evidence
Event loop blocking Ruled out 500ms lag detector never fires
SQLite WAL checkpoint Ruled out WAL stays at 1-32 pages
Memory pressure Ruled out RSS 130-270MB, heap 28-39MB, stable
Keep-alive timeout race Ruled out Set to 65s/70s, issue persists
Cron jobs competing for proxy Ruled out Moved to localhost, issue persists
Slow server processing Ruled out Every request completes in 2-55ms

Observations pointing to proxy compression/buffering:

  1. The proxy compresses 24KB → ~1KB with zstd, then takes 11 seconds to flush that 1KB to the browser

  2. All 4 concurrent test windows unblock at the same instant — a single event releases all pending responses

  3. The delay is specifically in Content Download (body delivery), not TTFB (header delivery) — the proxy forwards headers promptly but holds the body

  4. fly proxy 13000:3000 connects via TCP but immediately resets on HTTP request — could not use this to bypass the edge proxy for comparison

Questions

  1. The proxy receives the complete response in ~154ms but takes 10.94s to deliver the 1KB compressed body. What is happening during those 11 seconds?

  2. Could the zstd compression pipeline have a flushing or flow control issue? Is there a way to disable proxy-level compression to test?

  3. The “all 4 windows unblock simultaneously” pattern suggests a single periodic event in the proxy — could this be connection pooling, HTTP/2 flow control, or response buffering?

  4. Can you check the proxy logs for fly-request-id: 01KNT7BCQZ192R58K9NQECAFD8-fra?

  5. fly proxy 13000:3000 connects via TCP but immediately resets — is this expected, or does it indicate a networking issue on the machine?

Disclosure:

Above issue was rewritten for clarity from my debugging notes by AI.

Could you run a request with the header flyio-debug: doit and post the result? Also, could you run a traceroute or mtr report to your app domain?

As a small side note, this one is usually just due to the app not listening on IPv6 (which it’s not generally required to do).

https://fly.io/docs/networking/app-services/#get-the-service-listening-on-the-right-address-within-the-vm-2

[The tunnel leads into the .internal network, which is IPv6-only.]

Thanks for the quick response. Here are the requested diagnostics.

flyio-debug: doit — 87 requests over 60 seconds

curl -s -H "flyio-debug: doit" -o /dev/null -w "%{time_total}" \
  "https://[redacted]/book/__data.json?x-sveltekit-invalidated=01"

All 87 completed in 86ms–230ms. Zero showed the 10s delay. Sample request IDs:

  • 01KNTB1F393F99GSN838CQ0HG4-fra (218ms)
  • 01KNTB2FC50JTD5HM6XCXVM150-fra (230ms)
  • 01KNTB38A5S481XZ9NBQYR65W5-fra (173ms)

Full list of all 87 IDs available if needed.

Traceroute

 1  fritz.box                        3.2 ms
 2  p3e9bf505.dip0.t-ipconnect.de    8.8 ms
 3  m-ef2-i.m.de.net.dtag.de        14.1 ms
 4  80.150.168.185                   13.3 ms
 5  ae9.cr6-fra2.ip4.gtt.net        16.3 ms
 6  ip4.gtt.net (154.14.73.82)      17.4 ms
 7+ * * * (no response beyond hop 6)

Why curl propbably doesn’t reproduce it (?)

The issue only occurs on HTTP/2 persistent connections in browsers. Each curl opens a fresh HTTP/1.1 connection and completes instantly. The browser reuses the HTTP/2 connection established during initial page load, the delay hits on the first fetch over that existing connection.

I think the main idea was to post the corresponding flyio-debug header from the response, since it contains a lot of useful debugging information.

(It may not be possible to trace a request with only the Fly-Request-ID.)

(On a similar note, I don’t think they can determine your app or organization solely from your name in the forum.)

$ curl --head -H 'flyio-debug: doit' 'https://your-app-name.fly.dev/'

That will also verify that curl really is using HTTP/1.1. (Newer versions of curl default to HTTP/2, I believe.)

Sorry about that. Here are the flyio-debug response headers:

HTTP/2 200

flyio-debug: {"n":"edge-cf-fra2-ad2e","nr":"fra","ra":"...","rf":"Verbatim","sr":"fra","sdc":"fra2","sid":"0801e42a54e228","st":0,"nrtt":0,"bn":"worker-cf-fra2-910c","mhn":null,"mrtt":null}

fly-request-id: 01KNV2ESKYGHV1R3MEXNB12HD2-fra

server-timing: total;dur=7.1, hooks;dur=0.1, resolve;dur=7.0

Yes, correct curl is using HTTP/2, so my earlier theory about HTTP/1.1 vs HTTP/2 was wrong.

I feel a bit uncomfortable sharing the app name here publicly in the forum. If the fly support team can’t look up my app by the fly-request-id, I’m happy to share via DM

Quick update:

I destroyed the FRA machine entirely and ran on a CDG-only machine. The delay persists — requests still route through the FRA edge proxy (via: 2 ``fly.io``, 2 ``fly.io, request ID suffix -fra). This rules out anything machine-specific. The issue seems to be in the FRA edge proxy.

@johannes

Could you add flyio-debug: doit header to requests to your app from your browser and then share the request ID of the slow one, please? There are browser extensions that can do this.

I’ve been trying to reproduce this problem, but I’m yet to see a slow request to __data.json. I’m getting routed to ams, but I’ve also tried to reproduce it directly from the fra edges and all the requests are fast.

Here’s a slow request ID with flyio-debug: doit enabled:

01KNVF4AFAQ1D1RD09NE8JDPZN-fra

This was a __data.json request that took 8.4s in the browser.

I’ve reproduced this across Brave, Firefox, and Safari with all browser extensions disabled — same behavior in all three. I’ve also tested with multiple incognito windows simultaneously: when I trigger navigation in 4 separate windows at different times, they all finish loading at the exact same moment.

And another one:

date

Fri, 10 Apr 2026 10:36:23 GMT

fly-request-id

01KNVFDQ744HBY0NTK7YGYN2KV-fra

flyio-debug

{“n”:“edge-cf-fra2-ad2e”,“nr”:“fra”,“ra”:“2003:c2:b74f:5700:60:95df:7197:315b”,“rf”:“Verbatim”,“sr”:“fra”,“sdc”:“fra2”,“sid”:“0801e42a29dd78”,“st”:0,“nrtt”:0,“bn”:“worker-cf-fra2-910c”,“mhn”:null,“mrtt”:null}

server

Fly/9f41eaedb (2026-04-09)

server-timing

total;dur=9.4, hooks;dur=0.1, resolve;dur=9.3

via

2 fly.io

@johannes Thanks!

I can see in our logs that it took the proxy ~8s to see EOF from the app while reading response body.

I’m not yet sure what’s going on.

Could you check without compression, please? You can either disable it in fly.toml:

[http_service.http_options.response] compress = false

Or do one of those:

  • strip Accept-Encoding header from browser’s request
  • set Cache-Control=no-transform in the response
  • encode the response yourself in the app

In all of those cases the proxy shouldn’t attempt to encode the body and will return it as is.

I just tried with the below and redeployed:
[http_service.http_options.response]
compress = false

Still same ~8-14s delay. Some more requests:

**
date** Fri, 10 Apr 2026 11:34:54 GMT

fly-request-id 01KNVJRWA75S6GCMGW63C50154-fra


date Fri, 10 Apr 2026 11:34:58 GMT

fly-request-id 01KNVJS00P4W8GH12R3DV1XAN2-fra


date Fri, 10 Apr 2026 11:34:56 GMT

fly-request-id 01KNVJRYPG9A7QHWFZKN84T1KZ-fra


date Fri, 10 Apr 2026 11:40:18 GMT

fly-request-id 01KNVK2RH7XS2YWJ47FNH5JRDV-fra

Sorry, I made a typo. It should be:

[http_service.http_options]
   compress = false

The responses you’ve shared last were still getting compressed.

Thanks. I updated the .toml, deployed and just tried again:

date Fri, 10 Apr 2026 13:53:37 GMT ——— 13.66s
fly-request-id 01KNVTPWERR9PEZXBC4QMKN55H-fra


date Fri, 10 Apr 2026 13:54:48 GMT ——— 14.26s

fly-request-id 01KNVTS1F0J3TC5WBZGSS7RX2E-fra

flyio-debug {“n”:“edge-cf-fra2-ad2e”,“nr”:“fra”,“ra”:“2003:c2:b74f:5700:60:95df:7197:315b”,“rf”:“Verbatim”,“sr”:“fra”,“sdc”:“fra2”,“sid”:“0801e42a29dd78”,“st”:0,“nrtt”:0,“bn”:“worker-cf-fra2-910c”,“mhn”:null,“mrtt”:null}

serve Fly/9f41eaedb (2026-04-09)


Interesting. It no longer compresses and according to our logs we process the request rather quickly. For example, for 01KNVTS1F0J3TC5WBZGSS7RX2E-fra:

2026-04-10 13:54:48.416643000 edge <- user-agent: Request { ... }
2026-04-10 13:54:48.429801000 edge -> user-agent: Response { ... }

So it took us roughly 13ms to process it but then ~14 seconds to stream the body.

I also managed to reproduce it myself by just refreshing the page. Occasionally, some of the .css files take ~15 seconds to load.

I’m gonna add a bit more logging, as it’s not really clear what’s going on.

@johannes

I’m relatively sure the issue you observe has to do with the 27Mb video on the front page.

It gets downloaded over the same HTTP/2 connection like the rest of the assets.

I’m not 100% sure yet what exactly is the problem - it could be as simple as TCP head of line blocking, or it could be a bug in our HTTP/2 implementation. We are gonna investigate this.

For now the quickest way to unblock you is to move the video files to a different domain so that the browsers open another TCP connection to download them. It can be some kind of S3 storage, or it can be the same Fly app with the same machine, just on a different domain with a different TLS cert.

Thanks a lot for tracking this down. I really appreciate the support!

Removing the video from the front page confirms it: the 10s delay is gone.

For context, we use the same system on other sites that also serve videos from the same origin, but those are smaller (~10MB) and we haven’t seen this issue there (video is served from an SQLite database). So there might be a threshold where it starts to break down.

One thing that’s unclear to me: the video appears to finish loading during the initial page load. If I wait well after the page is fully loaded and then navigate, the delay still occurs. So it doesn’t seem like the video is actively competing for bandwidth at that point. More like the large transfer leaves the HTTP/2 connection in a bad state (?)

Looking forward to hearing what your investigation into the HTTP/2 side turns up.

@johannes

I believe this should be fixed now.

It was a misconfiguration in our HTTP/2 stack.

Each HTTP/2 stream (e.g. request within a particular HTTP/2 connection) has a window size. This window size is the maximum amount of inbound data that can be in-flight (meaning data that has been received, but not yet processed by a peer). There is also a connection level window as well as the stream level window and received data counts against both.

When your browser makes a request to your app, there are at least two HTTP/2 connections: between the browser and an edge and between the edge and the worker host where the machine is running. We can’t control window size on the browser side, but we do control HTTP/2 parameters between our edges and workers.

Normally, window size is dynamic and can grow or shrink depending on how fast the peers could process the data. To make memory usage predictable, we disable dynamic window between our hosts and instead configure it to a fixed value.

We used to configure stream and connection window sizes to the same value, and that’s what caused the issue. It allowed a single stream to consume full connection window and block other streams from receiving any data, provided the data from this stream wasn’t getting processed by the client. And that’s exactly what happened.

Your browser fetched the first few KBs of the video (to show the first frame and to process the metadata), but stopped processing the rest of the data until the video was played. Because of that, the data was “stuck” and was still counted towards connection window. As soon as the video was played, the data got released and the rest of the streams could proceed.

@pavel Thank you for digging deeper into this issue.
However after I did a redeploy I’m still seeing the same issue.
To test I opened for private windows side by side of the landing page.
I then wait ca. 20s. I then click one after the other the same link.
There’ still a delay of 4-10s in each round of tests. And all 4 tabs all finish at the exact same second after the loading delay.

fly-request-ids:
01KP5QPKSBMQ8413J3GFX9S8P0-fra
01KP5QR5XN6W55EPV1EWVXERAN-fra
01KP5QR50XYTP4MFEEAE9NC8FH-fra
01KP5QR826MJYD274VH0FD7W7Y-fra
01KP5QR7F3815B4Z6SRP3NDFMD-fra

01KP5RJ5YPJCERRZEKX4WRTP7V-fra
01KP5RJ7D5SQX1TTQS5NEQJ9CS-fra
01KP5RJ7ZTDN4Z7NDVTS1DPFYA-fra
01KP5RJ9ZXRG7R9YD3M914RJGE-fra

server: Fly/b35aee356 (2026-04-14)

If I remove the mp4 video from the landing page, there’s no issue.


**

I have made a screen recording to illustrate this behaviour:

Video uploaded to CleanShot Cloud**

Interesting. I can’t reproduce anymore.

Could you check if the issue reproduces on the test app I used while debugging it - https://pbor-http2-test.fly.dev/

Play the video and then pause it after 2-3 seconds. Wait a few seconds and hover over the link. Does it load the text correctly or is it getting stuck?

For me it used to reproduce the issue 100% of the times before the fix and now it works all the time.

It works when I try in one window. But when I try in 4 parallel windows, it will not load / display the hovered text.

here’s a screen recording: Video uploaded to CleanShot Cloud

Update I also tested in Firefox, and there it works without problems.
But in Chrome (new install no extensions) and Brave it gets stuck.

After around 50s if finishes loading and displays the text (so correlation to larger files and longer delays is obvious here)

here’s another screen recording: