London (lhr) region network issues?

jbarham · February 27, 2021, 8:55am

I’m testing a DNS UDP server in several regions and so far it’s pretty smooth, but I’m getting frequent connection timeouts to the London based instance, both from Pingdom and a personal server in Linode’s London data center.

The London instance appears to be healthy otherwise as it’s calling home to the controlling web server on schedule.

Any ideas?

P.S. Eagerly awaiting the opening of tcp/53!

jerome · February 27, 2021, 11:51am

Taking a look.

It seems like one of our London servers is having some issues.

jerome · February 27, 2021, 2:57pm

@jbarham The situation should’ve been resolved about an hour ago. Is it looking better from your end?

This might be a separate issue.

jbarham · February 27, 2021, 9:51pm

@jerome Unfortunately I’m still seeing connectivity issues to my London instance, even after a restart.

Outbound HTTP connections from my London instance work fine. It just seems that incoming DNS UDP requests/responses are frequently being dropped.

FWIW I also have instances running in iad and lax and Pingdom has not reported any connection problems with those instances.

thomas · February 28, 2021, 12:06am

Hey! I just took a closer look at LHR and UDP there seems to be working fine (I did manage to confuse myself while debugging, and while I was confused I flushed a route cache, so if your app is suddenly working again, that’d be good to know).

Can you tell me more about your app? What’s it named? I can take a closer look at it for you.

(You can hit us up privately if you’d like).

jbarham · February 28, 2021, 2:07am

Still seeing the same issue after your route cache flush.

The app name is slickdns. It’s an authoritative DNS server. You can test it by running dig @213.188.193.79 wombatsoftware.com.

The same app is also running in LAX and IAD and Pingdom (checking every minute) has always been able to get a response from the instances in those regions.

Looking at the LHR logs, it looks like the DNS request is always getting through to the instance but the response is frequently getting dropped.

jbarham · March 1, 2021, 8:45am

For added context here are the results I get when I test my Fly nameserver from my London Linode VM:

$ ping 213.188.193.79 
PING 213.188.193.79 (213.188.193.79) 56(84) bytes of data.
64 bytes from 213.188.193.79: icmp_seq=1 ttl=57 time=1.47 ms
64 bytes from 213.188.193.79: icmp_seq=2 ttl=57 time=1.46 ms
64 bytes from 213.188.193.79: icmp_seq=3 ttl=57 time=1.54 ms
^C
--- 213.188.193.79 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 5ms
rtt min/avg/max/mdev = 1.459/1.487/1.537/0.047 ms

DNS query:

$ dig @213.188.193.79 wombatsoftware.com
; <<>> DiG 9.11.5-P1-1ubuntu2-Ubuntu <<>> @213.188.193.79 wombatsoftware.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 2496
;; flags: qr aa rd; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;wombatsoftware.com.		IN	A

;; ANSWER SECTION:
wombatsoftware.com.	3600	IN	A	151.101.65.195
wombatsoftware.com.	3600	IN	A	151.101.1.195

;; Query time: 3 msec
;; SERVER: 213.188.193.79#53(213.188.193.79)
;; WHEN: Mon Mar 01 08:38:00 UTC 2021
;; MSG SIZE  rcvd: 104

The very low ping and response times, when I get a response from my server, suggests that the Linode/Fly/Pingdom servers are all in the same data center, or very close. So maybe this would have some impact on the routing of Fly UDP traffic?

kurt · March 1, 2021, 7:19pm

I just wanted to pop in here and say that it’s very cool you’re shipping an authoritative DNS service on Fly.

We’ll get the weird timeouts in LHR figured out. What region is that Linode instance in?

kurt · March 1, 2021, 7:40pm

Are you still getting pingdom failures? It seems like it’s working now, possibly because your app got a new VM.

Assuming it is working, if happens again will you check flyctl status and see if your instance is showing restarts > 0?

jbarham · March 1, 2021, 9:21pm

My Linode instance is in Linode’s London data center.

Note that currently I’m just testing my authoritative DNS service, I can’t actually ship it until tcp/53 is open.

jbarham · March 1, 2021, 9:25pm

Unfortunately I’m still seeing failures even after restart.

Again, it looks like the DNS request is always being received by my server, it’s just the response that is frequently being being dropped.

matthewrees · March 2, 2021, 3:13am

FWIW I am seeing the same for YYZ running Pihole. I see my client request pop up in the Pihole dashboard but more often than not (subjectively) do not get a response back (I just ran a dig 10 times in a row and it worked half the time).

Just to make sure, I’ve tried different upstream DNS servers as well as running Unbound on it for recursion, but no luck. I’m happy to help debug during US Eastern time!

EDIT: I am accessing it via public IP right now if that helps, not via Wireguard yet.

kurt · March 2, 2021, 11:57pm

Alrighty, we’ve been able to replicate the problem. UDP responses are, indeed, going missing sometimes. We’re working on it!

thomas · March 3, 2021, 1:15am

I think we may have fixed it. Before I say more: did we fix it?

(I can bounce DNS requests off that wombat DNS thingy, from a host in London).

jbarham · March 3, 2021, 1:34am

Confirmed! I now get a UDP response from my London DNS server, from my London Linode VM, every time.

Thanks for fixing this.

matthewrees · March 3, 2021, 1:42am

Just tested and I am still observing the issue in YYZ.

thomas · March 3, 2021, 1:47am

What’s the public IP of your app?

matthewrees · March 3, 2021, 1:50am

77.83.140.167

thomas · March 3, 2021, 1:52am

Poking at it. It’s not exactly the same case we saw with the LHR host, but it might be related.

(For the record: the case I’m talking about is super dumb; we just had a stray iptables rule that took forever for me to notice, because it only triggered from LHR to LHR.)

matthewrees · March 3, 2021, 4:51pm

Just tested again and it seems to be resolved?

EDIT: Nope, but it was doing well for a while there!

Topic		Replies	Views
UDP network problems?	7	443	September 14, 2021
DNS UDP responses getting lost in Europe	9	693	March 20, 2021
UDP timeouts in Europe?	11	815	December 10, 2021
Any issues with your routing over the past half hour? lhr	5	472	March 23, 2021
LAX <> LHR packet loss/issues?	2	254	July 26, 2022

London (lhr) region network issues?

Related topics