fly.dev domains intermittent network connection issues

Hello,

I have two different fly apps (in two different organizations) that are both experiencing connection issues. Both apps are using their fly.dev domains (*.fly.dev). I cannot access the website from San Francisco, and I have received user reports from San Diego about connection issues.

Some debugging notes:

  • nc to the assigned IPv4 on ports 80/443 succeeds (TCP open).

  • curl to the bare IP (HTTP/HTTPS) connects, but either resets or fails the TLS handshake.

  • Forcing the correct Host/SNI with --resolve <domain>:443:<ip> works and returns the expected HTML.

  • I also tried moving the machines between regions (LAX, SJC → ORD) with the same results.

The apps are working when accessed from different regions (e.g. other users are not experiencing any problems, and I can access the app if I VPN from a different location).

Is anyone else experiencing this issue?

Update: the issue has returned. The previous success seems unrelated to the IP reconfigurations.

Edit: VPN’ing to a different location still resolves the issues.

Some more debugging notes if it is helpful:

  1. Default ISP resolver → dig interest-trace.fly.dev returns SERVFAIL.
  2. Cloudflare (1.1.1.1) and Google (8.8.8.8) → resolve fine to 37.16.x.x.
  3. Because of this, curl and traceroute on my ISP path can’t even look up the host.
  4. With VPN (egress SFO), DNS works and the site loads normally.
  5. Moving the app between Fly regions (LAX, SJC → ORD) didn’t change behavior.
  6. Switching to a dedicated IP did not resolve the issue.

Hi @ozziek — thanks for the debugging information. Can you share what ISP you’re with?

Hi @bglw , thanks for the response!

My ISP is AT&T.

In case it is helpful, I recognized the issue as early as Friday Night PST (Sept. 12th), and I am not able to reproduce the issue today.

Thanks, we’ll keep looking into the issue.

This isn’t the first report of this, and so far the reports are somewhat contained to people on AT&T, so it seems there’s something afoot between their DNS server and ours.

Yup - also reporting in using AT&T. Resolution to a fly.dev domain will randomly not work. Has happened twice in the past 3 days (while I am at my computer at least). Just happened again about 10 mins ago. AT&T (Los Angeles). Log of additional times….

  • Sept 15th 5:24pm PST
  • Sept 15th 10:14pm PST

This happen to my app aswell, based in Singapore. Temporary switched to access by IP address.

@OnlyC do you happen to have an approximate timestamp of when this happened? And if possible what is your local ISP and the DNS server(s) you’re using?

@bglw has rolled out an attempted fix a couple of hours ago, but if that’s still happening after that we’ll have to dig deeper.

Hi @PeterCxy , I just was able to reproduce the issue (11:16am PST).

Attempting to access my .fly.dev failed to connect. Turning on my VPN, resolved the issue again.

Are you located west coast US? I happen to be checking on our DNS resolver and not seeing any errors from our side anywhere west coast US :thinking:

Yes, located in San Francisco.

Let me know if there is any data I can collect the next time this happens.

:waving_hand: @ozziek @uncvrd Since without a source IP this kind of issues are really hard to debug, I set up some tracking on a test app domain fake-public-dns-debug.fly.dev. Could you try to dig or just access that domain without a VPN through AT&T’s DNS resolver? It’s expected to fail since nothing is served on that domain, but these queries would tell us which IP(s) they’re resolving from, and then we can set up some monitoring specifically for them.

here ya go! is this what you need?

dig fake-public-dns-debug.fly.dev

; <<>> DiG 9.10.6 <<>> fake-public-dns-debug.fly.dev
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 10913
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;fake-public-dns-debug.fly.dev.	IN	A

;; Query time: 8 msec
;; SERVER: 2600:1700:4641:21e0::1#53(2600:1700:4641:21e0::1)
;; WHEN: Tue Sep 16 12:35:44 PDT 2025
;; MSG SIZE  rcvd: 58

Hi @PeterCxy , I just accessed via the browser and here is my dig output

dig fake-public-dns-debug.fly.dev

; <<>> DiG 9.10.6 <<>> fake-public-dns-debug.fly.dev
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 16368
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;fake-public-dns-debug.fly.dev.	IN	A

;; AUTHORITY SECTION:
fly.dev.		1751	IN	SOA	ns1.flydns.net. ops.fly.io. 1758051205 86400 7200 604800 300

;; Query time: 11 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Tue Sep 16 12:34:23 PDT 2025
;; MSG SIZE  rcvd: 114



;; Query time: 11 msec

;; SERVER: 8.8.8.8#53(8.8.8.8)

;; WHEN: Tue Sep 16 12:34:23 PDT 2025

;; MSG SIZE  rcvd: 114

Ok! I think I caught their IP, if you don’t mind you can try it a couple times spaced some time apart to see if they have some different outgoing IPs, but this is good! I’ll go see if I can set up some monitoring for them.

In this case you’re resolving from 8.8.8.8, Google’s public DNS server. Is this the same DNS you used when you ran into the problem?

awesome thanks, any preference on duration of period between checks?

No, but general I’d assume they have a cache at least on the order of minutes, so maybe 10 - 20 minutes apart would be ideal.

I have set up some monitoring for all DNS queries from AT&T’s recursive resolver. If the resolution errors happen again, could you please post an approximate timestamp and which fly.dev domain you’re resolving here in this thread? That’ll help us narrow down exactly what happened during the query.

(Or if you are not comfortable posting the domain here – feel free to email peter at fly dot io as well)

even my app is experiencing this