I have two different fly apps (in two different organizations) that are both experiencing connection issues. Both apps are using their fly.dev domains (*.fly.dev). I cannot access the website from San Francisco, and I have received user reports from San Diego about connection issues.
Some debugging notes:
nc to the assigned IPv4 on ports 80/443 succeeds (TCP open).
curl to the bare IP (HTTP/HTTPS) connects, but either resets or fails the TLS handshake.
Forcing the correct Host/SNI with --resolve <domain>:443:<ip> works and returns the expected HTML.
I also tried moving the machines between regions (LAX, SJC → ORD) with the same results.
The apps are working when accessed from different regions (e.g. other users are not experiencing any problems, and I can access the app if I VPN from a different location).
This isn’t the first report of this, and so far the reports are somewhat contained to people on AT&T, so it seems there’s something afoot between their DNS server and ours.
Yup - also reporting in using AT&T. Resolution to a fly.dev domain will randomly not work. Has happened twice in the past 3 days (while I am at my computer at least). Just happened again about 10 mins ago. AT&T (Los Angeles). Log of additional times….
@OnlyC do you happen to have an approximate timestamp of when this happened? And if possible what is your local ISP and the DNS server(s) you’re using?
@bglw has rolled out an attempted fix a couple of hours ago, but if that’s still happening after that we’ll have to dig deeper.
@ozziek@uncvrd Since without a source IP this kind of issues are really hard to debug, I set up some tracking on a test app domain fake-public-dns-debug.fly.dev. Could you try to dig or just access that domain without a VPN through AT&T’s DNS resolver? It’s expected to fail since nothing is served on that domain, but these queries would tell us which IP(s) they’re resolving from, and then we can set up some monitoring specifically for them.
Ok! I think I caught their IP, if you don’t mind you can try it a couple times spaced some time apart to see if they have some different outgoing IPs, but this is good! I’ll go see if I can set up some monitoring for them.
I have set up some monitoring for all DNS queries from AT&T’s recursive resolver. If the resolution errors happen again, could you please post an approximate timestamp and which fly.dev domain you’re resolving here in this thread? That’ll help us narrow down exactly what happened during the query.
(Or if you are not comfortable posting the domain here – feel free to email peter at fly dot io as well)