Thank you, this was an important insight! I was able to reproduce this behaviour from inside of a fly instance, so I am seeing the same thing as you.
I suspect that your DNS servers are also in fly instances (or with networking which is similar enough to a fly instance), which is why your DNS servers are having the same issue as a fly instance.
I’m trying to diagnose the problem, but I don’t have much insight to your infrastructure, so I’m going to throw some stuff out and hope that we can help each other to diagnose this issue.
From what I can tell, the servfail is coming from your internal DNS server at fdaa::3
. A tcpdump on port 53 (inside of a fly instance) shows:
15:50:31.548240 IP6 fdaa:0:2aad:a7b:a992:6c7f:ecd3:2.57148 > fdaa::3.53: 43605+ A? ns2.<redacted>.ch. (37)
15:50:31.548260 IP6 fdaa:0:2aad:a7b:a992:6c7f:ecd3:2.57148 > fdaa::3.53: 42580+ AAAA? ns2.<redacted>.ch. (37)
15:50:31.548380 IP6 fdaa::3.53 > fdaa:0:2aad:a7b:a992:6c7f:ecd3:2.57148: 42580 ServFail 0/0/0 (37)
15:50:31.548380 IP6 fdaa::3.53 > fdaa:0:2aad:a7b:a992:6c7f:ecd3:2.57148: 43605 ServFail 0/0/0 (37)
I wonder why your DNS server is not able to serve records for my domain, but it can for others (e.g. google.com
). I suspect that there is some connectivity issue between your nameservers and mine.
As I mentioned before, my DNS server is hosted on fly, according to this, UDP won’t work over IPv6 on fly, so that could be a partial explanation.
What I’ve also been able to find out is that I also don’t have IPv4 UDP connectivity between two fly instances:
Pinging my dns IP from my local machine:
> ping -c 2 213.188.208.5
PING 213.188.208.5 (213.188.208.5): 56 data bytes
64 bytes from 213.188.208.5: icmp_seq=0 ttl=52 time=24.915 ms
64 bytes from 213.188.208.5: icmp_seq=1 ttl=52 time=21.725 ms
--- 213.188.208.5 ping statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 21.725/23.320/24.915/1.595 ms
Pinging my dns IP from inside a fly instance
# in a fly instance
> ping -c 2 213.188.208.5
PING 213.188.208.5 (213.188.208.5) 56(84) bytes of data.
--- 213.188.208.5 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1017ms
Similarly, an IPv4 DNS query from my machine works, but doesn’t work on fly:
From my machine:
> dig @213.188.208.5 <redacted>.ch NS
; <<>> DiG 9.10.6 <<>> @213.188.208.5 <redacted>.ch NS
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 42512
;; flags: qr aa rd; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 5
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;<redacted>.ch. IN NS
;; ANSWER SECTION:
<redacted>.ch. 86400 IN NS ns1.<redacted>.ch.
<redacted>.ch. 86400 IN NS ns2.<redacted>.ch.
;; ADDITIONAL SECTION:
ns1.<redacted>.ch. 86400 IN A 213.188.208.5
ns1.<redacted>.ch. 86400 IN AAAA 2a09:8280:1:9f8:b4a:f460:bfdb:21a4
ns2.<redacted>.ch. 86400 IN A 213.188.209.57
ns2.<redacted>.ch. 86400 IN AAAA 2a09:8280:1:a1f3:a957:bd27:d3a2:d9b5
;; Query time: 100 msec
;; SERVER: 213.188.208.5#53(213.188.208.5)
;; WHEN: Sat Nov 05 17:03:47 CET 2022
;; MSG SIZE rcvd: 304
From fly instance:
dig @213.188.208.5 <redacted>.ch NS
; <<>> DiG 9.16.1-Ubuntu <<>> @213.188.208.5 <redacted>.ch NS
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached
But, the query from the fly instance does work when using tcp:
dig +tcp @213.188.208.5 <redacted>.ch NS
; <<>> DiG 9.16.1-Ubuntu <<>> +tcp @213.188.208.5 <redacted>.ch NS
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 23850
;; flags: qr aa rd; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 5
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: 8af4501be56f92b3 (echoed)
;; QUESTION SECTION:
;<redacted>.ch. IN NS
;; ANSWER SECTION:
<redacted>.ch. 86400 IN NS ns1.<redacted>.ch.
<redacted>.ch. 86400 IN NS ns2.<redacted>.ch.
;; ADDITIONAL SECTION:
ns1.<redacted>.ch. 86400 IN A 213.188.208.5
ns1.<redacted>.ch. 86400 IN AAAA 2a09:8280:1:9f8:b4a:f460:bfdb:21a4
ns2.<redacted>.ch. 86400 IN A 213.188.209.57
ns2.<redacted>.ch. 86400 IN AAAA 2a09:8280:1:a1f3:a957:bd27:d3a2:d9b5
;; Query time: 171 msec
;; SERVER: 213.188.208.5#53(213.188.208.5)
;; WHEN: Sat Nov 05 16:01:19 UTC 2022
;; MSG SIZE rcvd: 316
Based on that analysis, I have the following questions:
- Why is there no IPv4 UDP connectivity between fly instances?
- Do you have an idea how long it will be until you have IPv6 UDP?
- Is it possible for your DNS server to fall back to DNS over TCP on IPv4? Maybe it already does this, but there’s some other misconfiguration on my side causing problems…