fly.dev intermittent DNS issues?

Are there issues with the fly.dev DNS servers?

For example, I got this just now on 3 successive calls:

$ host fly.dev
Host fly.dev not found: 3(NXDOMAIN)

$ host fly.dev
Host fly.dev not found: 3(NXDOMAIN)

$ host fly.dev
fly.dev has address 77.83.140.34

We noticed this because our application has had intermittent outages recently. The main domain is a CNAME to a fly.dev hostname:

$ host fleetweb.standardfleet.com
Host fleetweb.standardfleet.com not found: 3(NXDOMAIN)

$ dig fleetweb.standardfleet.com       
fleetweb.standardfleet.com. 60	IN	CNAME	nikola-fleetweb.fly.dev.

$ host nikola-fleetweb.fly.dev
Host nikola-fleetweb.fly.dev not found: 3(NXDOMAIN)

I would note that it is working sometimes, and returns correctly:

$ host fleetweb.standardfleet.com
fleetweb.standardfleet.com is an alias for nikola-fleetweb.fly.dev.
nikola-fleetweb.fly.dev has address 109.105.222.202
nikola-fleetweb.fly.dev has IPv6 address 2a09:8280:1::1:1eaf

What’s going on?

We’re experimenting with using our own DNS authoritative server for the fly.dev domain (and all its subdomains).

Do you know which DNS servers you’re using? Are they the ones provided by your ISP, public ones or private ones?

A third of the DNS queries are presently handled by this new server.

It somehow looks like the DNS server you’re using is not recursing / flattening CNAMEs.

What’s odd is <app-name>.fly.dev are not CNAMEs. fly.dev is even hard-coded!

We’re looking into this. I can’t reproduce right now but I’ll keep at it.

It looks like ns1.flydns.net and ns2.flydns.net aren’t configured properly.

They seem to be missing an AUTHORITY section?

$ dig @ns1.flydns.net +norecurse +short fly.dev SOA

$ dig @ns1.dnsimple.com +norecurse +short fly.dev SOA
ns1.dnsimple.com. admin.dnsimple.com. 1552045463 86400 7200 604800 300

I note that Google’s DNS works intermittently:

$ host nikola-fleetweb.fly.dev 8.8.8.8
Using domain server:
Name: 8.8.8.8
Address: 8.8.8.8#53
Aliases: 

Host nikola-fleetweb.fly.dev not found: 3(NXDOMAIN)

$ host nikola-fleetweb.fly.dev 8.8.8.8
Using domain server:
Name: 8.8.8.8
Address: 8.8.8.8#53
Aliases: 

nikola-fleetweb.fly.dev has address 109.105.222.202
nikola-fleetweb.fly.dev has IPv6 address 2a09:8280:1::1:1eaf

And so does CloudFlare…

I’m also having problems to access an app I just deployed. I’m using cloudflare DNS 1.1.1.1

@mrjbq7, the nameservers should now correctly return a SOA record and NS records. Does this fix your issues?

@beeb what problems are you seeing?

Is there a reason you aren’t replying with SOA records for subdomains?

$ host -v nikola-fleetweb.fly.dev ns1.dnsimple.com
...
;; AUTHORITY SECTION:
fly.dev.		300	IN	SOA	ns1.dnsimple.com. admin.dnsimple.com. 1552045717 86400 7200 604800 300


$ host -v nikola-fleetweb.fly.dev ns1.flydns.net
...
Trying "nikola-fleetweb.fly.dev"
Host nikola-fleetweb.fly.dev not found: 3(NXDOMAIN)

Here’s the output I get:

❯ host -v nikola-fleetweb.fly.dev ns1.dnsimple.com
Trying "nikola-fleetweb.fly.dev"
Using domain server:
Name: ns1.dnsimple.com
Address: 162.159.24.4#53
Aliases:

;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 19729
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;nikola-fleetweb.fly.dev.       IN      A

;; ANSWER SECTION:
nikola-fleetweb.fly.dev. 3600   IN      A       109.105.222.202

Received 57 bytes from 162.159.24.4#53 in 23 ms
Trying "nikola-fleetweb.fly.dev"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 27011
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;nikola-fleetweb.fly.dev.       IN      AAAA

;; ANSWER SECTION:
nikola-fleetweb.fly.dev. 3600   IN      AAAA    2a09:8280:1::1:1eaf

Received 69 bytes from 162.159.24.4#53 in 10 ms
Trying "nikola-fleetweb.fly.dev"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 39520
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;nikola-fleetweb.fly.dev.       IN      MX

;; AUTHORITY SECTION:
fly.dev.                300     IN      SOA     ns1.dnsimple.com. admin.dnsimple.com. 1552045723 86400 7200 604800 300

Received 99 bytes from 162.159.24.4#53 in 8 ms

DNSimple only returns the SOA on the MX response. I don’t think it’s necessary to return that record on any response other than for a SOA request.

As an aside: please don’t crop the output if you want the best troubleshooting you can get.

We’re further adjusting the server to return NOERROR instead of NXDOMAIN when we know about a subdomain, but don’t have anything to return for the query.

Given this issue seems to be affecting multiple people, is there any way that someone can update the status page to reflect that there are DNS issues:

It’s completely green right now, and we’re getting alerted across the board from our synthetic monitoring.

I apologize, I thought there was still an issue – it appears that Google DNS, Cloudflare, and several others seem to work now. I assume the old records timed out and they were able to re-query the flydns.net nameservers.

@mikeglazer we aren’t having any issues with DNS right now. What are your alerts reporting?

Odd, this is what we’re seeing with our grafana synthetic monitoring from the SF region:

level=error target=https://api.OURHOST.com/health probe=SanFrancisco region=AMER instance=https://api.OURHOST.com/health job="API Health" check_name=http source=synthetic-monitoring-agent msg="Error resolving address" err="lookup api.OURHOST.com on 8.8.8.8:53: no such host"

And we’ve had spurious user reports throughout the day.

Assuming api.OURHOST.com is a CNAME for OURHOST.fly.dev, then that’s what we are seeing too. I ran it a bunch of times just now querying Google DNS and there are a handful of failures currently:

# failure in AAAA records
$ host nikola-fleetweb.fly.dev 8.8.8.8
Using domain server:
Name: 8.8.8.8
Address: 8.8.8.8#53
Aliases: 

nikola-fleetweb.fly.dev has address 109.105.222.202
Host nikola-fleetweb.fly.dev not found: 3(NXDOMAIN)
Host nikola-fleetweb.fly.dev not found: 3(NXDOMAIN)

# failure in A/AAAA records
$ host nikola-fleetweb.fly.dev 8.8.8.8
Using domain server:
Name: 8.8.8.8
Address: 8.8.8.8#53
Aliases: 

Host nikola-fleetweb.fly.dev not found: 3(NXDOMAIN)

# success
$ host nikola-fleetweb.fly.dev 8.8.8.8
Using domain server:
Name: 8.8.8.8
Address: 8.8.8.8#53
Aliases: 

nikola-fleetweb.fly.dev has address 109.105.222.202
nikola-fleetweb.fly.dev has IPv6 address 2a09:8280:1::1:1eaf

Can you get this to fail for you? Query: nikola-fleetweb.fly.dev - Google Public DNS

That’s supposedly the debugging interface for Google’s resolvers.

Do you know approximately what time this started today? Also, can you get dig a nikola-fleetweb.fly.dev @8.8.8.8 +trace to fail? The host command is surprisingly opaque about what it’s doing.

One workaround here is to not use a CNAME for DNS. If you run fly ips list, then set an A record for the IPv4 and an AAAA record for IPv6 it’ll work fine. Technically a little faster, even, since there’s only one DNS lookup required.

Yes, it failed (no answer section) within 10 reloads:

{
  "Status": 3 /* NXDOMAIN */,
  "TC": false,
  "RD": true,
  "RA": true,
  "AD": false,
  "CD": false,
  "Question": [
    {
      "name": "nikola-fleetweb.fly.dev.",
      "type": 1 /* A */
    }
  ],
  "Comment": "Response from 137.66.40.8."
}

I noticed the failures beginning last night at 6pm Pacific Time.

They have a handy “Flush Cache” tool that might help: ล้างแคช  |  Public DNS  |  Google Developers

None of the nameservers on our network are returning nxdomain for that hostname. I’m kind of stumped.

EDIT: my (first) issue was fixed by assigning an IP, which somehow didn’t happen automatically