I’m testing a DNS UDP server in several regions and so far it’s pretty smooth, but I’m getting frequent connection timeouts to the London based instance, both from Pingdom and a personal server in Linode’s London data center.
The London instance appears to be healthy otherwise as it’s calling home to the controlling web server on schedule.
Hey! I just took a closer look at LHR and UDP there seems to be working fine (I did manage to confuse myself while debugging, and while I was confused I flushed a route cache, so if your app is suddenly working again, that’d be good to know).
Can you tell me more about your app? What’s it named? I can take a closer look at it for you.
Still seeing the same issue after your route cache flush.
The app name is slickdns. It’s an authoritative DNS server. You can test it by running dig @213.188.193.79 wombatsoftware.com.
The same app is also running in LAX and IAD and Pingdom (checking every minute) has always been able to get a response from the instances in those regions.
Looking at the LHR logs, it looks like the DNS request is always getting through to the instance but the response is frequently getting dropped.
For added context here are the results I get when I test my Fly nameserver from my London Linode VM:
$ ping 213.188.193.79
PING 213.188.193.79 (213.188.193.79) 56(84) bytes of data.
64 bytes from 213.188.193.79: icmp_seq=1 ttl=57 time=1.47 ms
64 bytes from 213.188.193.79: icmp_seq=2 ttl=57 time=1.46 ms
64 bytes from 213.188.193.79: icmp_seq=3 ttl=57 time=1.54 ms
^C
--- 213.188.193.79 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 5ms
rtt min/avg/max/mdev = 1.459/1.487/1.537/0.047 ms
DNS query:
$ dig @213.188.193.79 wombatsoftware.com
; <<>> DiG 9.11.5-P1-1ubuntu2-Ubuntu <<>> @213.188.193.79 wombatsoftware.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 2496
;; flags: qr aa rd; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available
;; QUESTION SECTION:
;wombatsoftware.com. IN A
;; ANSWER SECTION:
wombatsoftware.com. 3600 IN A 151.101.65.195
wombatsoftware.com. 3600 IN A 151.101.1.195
;; Query time: 3 msec
;; SERVER: 213.188.193.79#53(213.188.193.79)
;; WHEN: Mon Mar 01 08:38:00 UTC 2021
;; MSG SIZE rcvd: 104
The very low ping and response times, when I get a response from my server, suggests that the Linode/Fly/Pingdom servers are all in the same data center, or very close. So maybe this would have some impact on the routing of Fly UDP traffic?
FWIW I am seeing the same for YYZ running Pihole. I see my client request pop up in the Pihole dashboard but more often than not (subjectively) do not get a response back (I just ran a dig 10 times in a row and it worked half the time).
Just to make sure, I’ve tried different upstream DNS servers as well as running Unbound on it for recursion, but no luck. I’m happy to help debug during US Eastern time!
EDIT: I am accessing it via public IP right now if that helps, not via Wireguard yet.
Poking at it. It’s not exactly the same case we saw with the LHR host, but it might be related.
(For the record: the case I’m talking about is super dumb; we just had a stray iptables rule that took forever for me to notice, because it only triggered from LHR to LHR.)