Fly 6PN mesh drops TCP packets on long-lived WebSocket connections

Hi everyone,

I’ve been debugging an intermittent WebSocket disconnection issue for the past few days and I’ve narrowed it down to the 6PN internal mesh.

My app has an admin panel that should not be publicly accessible, so I access it via WireGuard VPN using the .internal address. The problem is that WebSocket connections over this path are extremely unstable — they drop every 4–36 seconds, making any real-time feature (Phoenix LiveView in my case) essentially unusable through the VPN. The same app works perfectly when accessed via Fly Proxy (HTTPS) or fly proxy.

I’m sharing detailed findings here in case this is a known issue or someone from the Fly team can take a look.

Summary

Long-lived TCP connections (WebSocket) to a Fly app via WireGuard + 6PN internal addresses (.internal:8080) experience random disconnections every 4–36 seconds. The same app works perfectly when accessed via Fly Proxy (HTTPS) or fly proxy (TCP tunnel). The issue is reproducible and isolated to the 6PN network path.

Environment

  • App: single machine, shared-cpu-1x, 1GB RAM, London (lhr)
  • Runtime: Elixir/Phoenix LiveView on Bandit HTTP server
  • Client: macOS 15, WireGuard app (App Store), utun8 interface
  • WireGuard peer: Fly gateway (created via fly wireguard create)
  • Access: http://.internal:8080 (plain HTTP, no TLS)

Symptoms

Phoenix LiveView uses WebSocket for real-time UI updates. When accessed via 6PN:

  1. WebSocket connects successfully
  2. Server sends 1-second timer updates (tiny JSON diffs, ~50–100 bytes each)
  3. After 4–36 seconds, the client stops receiving updates (page freezes)
  4. Server-side, the process keeps sending — diffs accumulate in the TCP write buffer
  5. After ~15s heartbeat timeout, the server detects the dead connection and closes it
  6. Client attempts WebSocket reconnection, fails for ~10s, falls back to HTTP long-polling

Controlled experiment

I created an identical test page (/ht) that increments a counter every 1 second. I tested three network paths to the same app, same machine, same page:

┌────────────────────────────────────────────────┬──────────────────┬──────────────────────────────────────────┐
│                      Path                      │    Transport     │                  Result                  │
├────────────────────────────────────────────────┼──────────────────┼──────────────────────────────────────────┤
│ https://myapp.example.com/ht                   │ Fly Proxy        │ Stable — 70+ ticks, zero drops           │
│                                                │ (HTTPS)          │                                          │
├────────────────────────────────────────────────┼──────────────────┼──────────────────────────────────────────┤
│ http://localhost:8081/ht via fly proxy         │ TCP tunnel       │ Stable — 100+ ticks, zero drops          │
│ 8081:8080                                      │                  │                                          │
├────────────────────────────────────────────────┼──────────────────┼──────────────────────────────────────────┤
│ http://<app>.internal:8080/ht via WireGuard +  │ 6PN mesh         │ Broken — freezes after 4–36 ticks, every │
│ 6PN                                            │                  │  time                                    │
└────────────────────────────────────────────────┴──────────────────┴──────────────────────────────────────────┘

The test was run with both paths open simultaneously. The HTTPS/fly-proxy connections remained stable while the 6PN connection dropped multiple times in the same time window.

MTU investigation

ICMP tests over the 6PN path reveal packet loss at larger sizes:

$ ping6 -c 3 -s 1400 fdaa:0:xxxx:xxx:xxx:0:a:302
3 packets transmitted, 0 packets received, 100.0% packet loss

$ ping6 -c 3 -s 1350 fdaa:0:xxxx:xxx:xxx:0:a:302
3 packets transmitted, 2 packets received, 33.3% packet loss

$ ping6 -c 3 -s 1200 fdaa:0:xxxx:xxx:xxx:0:a:302
3 packets transmitted, 3 packets received, 0.0% packet loss

The WireGuard interface MTU is 1420 (default), but the effective path MTU through 6PN appears to be ~1340 bytes.
Reducing the client interface MTU to 1280 (sudo ifconfig utun8 mtu 1280) did not fix the WebSocket drops — the connection still died after 16 seconds. So the MTU issue is real but there is also a separate connection-dropping problem.

What I’ve ruled out

  • Application bug: server-side process stays alive and keeps processing; the freeze is at transport level only
  • Bandit/Phoenix: same code works perfectly via Fly Proxy and fly proxy
  • WireGuard tunnel itself: fly proxy also uses WireGuard but works fine (it bypasses 6PN routing)
  • MTU alone: reducing MTU to 1280 (IPv6 minimum) didn’t fix it
  • Traffic volume: a completely static page (no server push) also disconnects via 6PN

Expected behavior

WebSocket connections over 6PN should be as reliable as those via Fly Proxy or fly proxy, especially for lightweight traffic (< 100 bytes/second).

Workaround

Using fly proxy 8081:8080 -a and accessing via http://localhost:8081 instead of http://.internal:8080.
This works perfectly but adds an extra step (and it is not feasible on smartphones).

Any insights would be appreciated. Happy to provide more logs or run additional tests if helpful.

Just to clarify, are you comparing between connecting via fly proxy from your local computer, versus connecting via a Wireguard tunnel using the .internal address, also from your local computer? If that is the case, then I wouldn’t expect any difference TBH because fly proxy just creates a userspace Wireguard tunnel under the hood. One possibility is that the Wireguard peer you created manually landed in a faraway region and that adds a lot of latency / instability for that peer.

Thanks for the quick response!

Yes, both tests are from the same local machine, same WireGuard tunnel active throughout.

On the region concern: my WireGuard peer is configured with endpoint lhr1.gateway.6pn.dev:51820, and the app runs in lhr — so same region, not a faraway one.

On fly proxy being “just a WireGuard tunnel”: that may be true at the transport level, but the routing through the 6PN mesh is clearly different, because the results are dramatically different from the same machine, at the same time:

  ┌────────────────────────────────────┬─────────────────────────┬──────────────────────┐
  │                Path                │   WebSocket stability   │        Notes         │
  ├────────────────────────────────────┼─────────────────────────┼──────────────────────┤
  │ fly proxy 8081:8080                │ 100+ ticks, zero drops  │ Rock solid           │
  ├────────────────────────────────────┼─────────────────────────┼──────────────────────┤
  │ Direct 6PN (myapp.internal:8080)   │ Dies every 4–36 seconds │ {:shutdown, :closed} │
  ├────────────────────────────────────┼─────────────────────────┼──────────────────────┤
  │ Public HTTPS (Fly Proxy edge)      │ 70+ ticks, zero drops   │ Also rock solid      │
  └────────────────────────────────────┴─────────────────────────┴──────────────────────┘

The WireGuard tunnel itself is healthy — ping6 to the VM is stable at ~37ms. The problem manifests only on long-lived TCP connections (WebSocket) routed through the 6PN mesh to the .internal address.

There’s also an MTU issue on this path: ping6 -s 1400 to the VM via 6PN shows 100% packet loss, while -s 1200 works fine. The interface MTU is 1420, so something in the 6PN path has a lower effective MTU (~1340). Reducing client MTU to 1280 didn’t fix the WebSocket drops though — they seem to be a separate issue.

So to summarize: fly proxy and direct 6PN clearly take different paths through your infrastructure, and only the direct 6PN path drops long-lived connections.

One difference that I just realized is that fly proxy connects through Wireguard wrapped inside a Websocket connection, while Wireguard is… just Wireguard, which runs over UDP. It’s possible that your local ISP heavily throttles the use of UDP as they sometimes tend to do, and that would also explain why you only start to run into issues after a while. Other than that, there’s really no difference between how traffic from fly proxy and a direct Wireguard interface is routed.

The lower effective MTU shouldn’t really be an issue since path MTU discovery should rectify that.

Thanks for the insight! That makes perfect sense, if fly proxy wraps WireGuard traffic inside a WebSocket (TCP) while the direct WireGuard tunnel runs over naked UDP, that would perfectly explain the difference we’re seeing.
You’re right, sometimes ISP likely throttles or deprioritizes long-lived UDP flows, which is why the direct tunnel drops packets after a few seconds while fly proxy stays rock solid.

We’ll try testing from a different network to confirm the UDP throttling hypothesis. In the meantime, fly proxy works great as a workaround.

Thanks again for digging into this!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.