Super lame I have to expose personal info here, since Fly won’t respond to technical support-related issues over email despite having spent $$$$, but here we are…
I’m having the weirdest problem accessing Fly-hosted sites with a small number of computers. (I can reproduce the issue on one machine, and have had the same report from a handful of customers, as well.) This issue has gone on for months, and after a ton of debugging, I’ve narrowed down the root cause to be Fly for some reason.
You and I will see a 404 page on FolioHD saying there’s no site found. This is the expected behavior, but what the users in question get is the browser ERR_NAME_NOT_RESOLVED screen.
I thought it was specific to FolioHD, but then I tested from the problematic machine on a Posthaven URL (which we also run on Fly): https://posthaven-prod.fly.dev/
Again, you and I will get a 404, but the problematic machine doesn’t even load that.
I have previously tried every troubleshooting step imaginable: checked hosts file, disabled firewall, tried incognito, enabled a VPN, tried multiple internet connections - all the same result. I’m even on the same network as other machines that can access the sites - it’s just this specific computer (MacOS, latest).
Is there some sort of blacklist that Fly maintains, where this particular machine is getting blocked for some reason?
definitely puzzling! happy to help troubleshoot, and thank you for frontloading all that context
As an aside, if you’re looking for email support, you might take a look at our paid plans – could be a good option if your anticipated spend aligns with the correspondingly expanded usage quota.
That said, we definitely don’t want anyone to have a fly-related problem for months! So if you do feel uncomfortable sharing certain info in the forum, you can redact your problem description-- if it’s essential info that you absolutely cannot share we’d be happy to receive that over email if needed.
This does have an important drawback-- other users will be less able to help out. Of course, as with the better part of issues on our end, that’s somewhat less of a disadvantage
Anywayy, on to the actual problem! Being able to reproduce this is a huge help, thank you! You mentioned that you have unaffected machines on the same network as an affected one–even better. A few things I’m curious about, that might help us narrow it down:
Can you curl those sites with the flyio-debug: doit header? This will, among other things, give us an idea where traffic is coming in from that network.
You’ve probably already done this, but what does dig say about those subdomains from the affected machines? Is it returning the same answers that you get on unaffected ones? How about if you use a large public resolver like 8.8.8.8?
I’m guessing the answer is “no, so far” but are you able to resolve any fly.dev domains from the problem clients? Are they able to hit debug.fly.dev?
Agree with all of the above from @eli. In addition to those suggestions, personally I’d also try temporarily turning off any/all browser plugins you may have on that particular machine (adblock, umatrix etc). Those can block requests.
And clear any DNS cache (as given that error message, it would appear to be a DNS issue). Visit chrome://net-internals/#dns then restart the browser.
Re: DNS cache, I tried that, but also it doesn’t seem to be browser-specific. (Same result in Chrome, Safari, Firefox, and also in incognito with no extensions enables.)
$ curl -v --dns-servers 8.8.8.8 https://debug.fly.dev
* Could not resolve host: debug.fly.dev
* Closing connection 0
curl: (6) Could not resolve host: debug.fly.dev
$ cat /etc/resolv.conf
#
# macOS Notice
#
# This file is not consulted for DNS hostname resolution, address
# resolution, or the DNS query routing mechanism used by most
# processes on this system.
#
# To view the DNS configuration used by this system, use:
# scutil --dns
#
# SEE ALSO
# dns-sd(1), scutil(8)
#
# This file is automatically generated.
#
nameserver 192.168.4.1
Working machine (different network now)
% cat /etc/resolv.conf
#
# macOS Notice
#
# This file is not consulted for DNS hostname resolution, address
# resolution, or the DNS query routing mechanism used by most
# processes on this system.
#
# To view the DNS configuration used by this system, use:
# scutil --dns
#
# SEE ALSO
# dns-sd(1), scutil(8)
#
# This file is automatically generated.
#
nameserver fe80::f1:4fff:feab:4fe4%en0
nameserver 192.168.1.1
Geez wow you’re right, it was Pow. Incredible fact-finding there.
I used a 2013 MacBook Pro, then handed it down to my wife. Later on, I bought her a new computer and I transferred her profile from the 2013 computer. Even though I had Pow in my own profile, it was probably installed system-wide and transferred over with it.
Thanks for bearing with me here. Please send a sizable bill to the ~maintainers~ of Pow.