DNS over TCP works, but UDP doesn't

hb9cwp · May 1, 2022, 9:41am

Before going to try CoreDNS, I got the simple example Trivial TCP/UDP Echo Service working in FRA with the following modification in main.go

//      port int = 5000
        port int = 53

and using this

# fly.toml file generated for os1 on 2022-05-01T11:15:54+02:00

app = "os1"

kill_signal = "SIGINT"
kill_timeout = 5
processes = []

[env]
  ECHO_PORT = 53

[experimental]
  allowed_public_ports = []
  auto_rollback = true

[[services]]
  internal_port = 53
  protocol = "udp"

  [[services.ports]]
    port = "53"

[[services]]
  internal_port = 53
  protocol = "tcp"

  [[services.ports]]
    port = "53"

Test with NetCat from an OpenBSD host:
OK for IPv4 TCP & UDP, IPv6 TCP, but KO for IPv6 UDP (the latter is documented elsewhere as still pending):

[rs@gate:~]$ nc -t -4 os1.fly.dev 53 
qwe
qwe
123
123
^C
[rs@gate:~]$ nc -u -4 os1.fly.dev 53 
sdf
sdf
xvcb
xvcb
^C
[rs@gate:~]$ nc -t -6 os1.fly.dev 53 
yxc
yxc
^C
[rs@gate:~]$ nc -u -6 os1.fly.dev 53 
qwert
^C
[rs@gate:~]$

tmm1 · June 11, 2022, 9:56pm

My services stopped responding today. A restart fixed it.

App = channelsdvrnet-dns

thomas · June 11, 2022, 9:57pm

All of them, or just the UDP part?

tmm1 · June 11, 2022, 10:03pm

I’m not sure, I should have tried dig tcp before restarting.

I’m wondering if there’s a way to setup a health check via fly.toml that would verify a udp dns response

Edit: issue started around 10:25am PST and lasted until I restarted at 2:15pm

thomas · June 11, 2022, 10:04pm

(1) there might not be right now (i’ll go check)

(2) i’m kind of kicking myself for not thinking of having a DNS healthcheck, and thanks for bringing that up.

Most of our health checks are run through Consul, which runs health checks locally, so there might be limited value in the simplest DNS health checks we can do, but there’s probably an “off-net” thing we could do here. I can’t promise a timeline (we do already do off-net monitoring for UDP on our platform, but they’re not as particular as specific DNS queries for specific apps), but I think this might be worth investigating.

ignoramous · June 12, 2022, 1:33pm

Locally (in vm)? Ref: Healthchecks and private networks - #2 by kurt

tmm1 · January 23, 2023, 9:58pm

Hi, we did a deploy today and now this is happening again.

I can do tcp lookups but udp not working.

TCP works:

$ dig +tcp 1-1-1-1.deadbeef.u.channelsdvr.net @ipdns2.channelsdvr.net

;; ANSWER SECTION:
1-1-1-1.deadbeef.u.channelsdvr.net. 604800 IN A	1.1.1.1

;; Query time: 122 msec
;; SERVER: 213.188.216.24#53(213.188.216.24)

UDP no response:

$ dig 1-1-1-1.deadbeef.u.channelsdvr.net @ipdns2.channelsdvr.net

; <<>> DiG 9.10.6 <<>> 1-1-1-1.deadbeef.u.channelsdvr.net @ipdns2.channelsdvr.net
;; global options: +cmd
;; connection timed out; no servers could be reached

After last time we setup a script check, and it is currently still passing when invoking dig against 127.0.0.1. So there’s something about udp routing into our instance that’s broken.

ignoramous · January 23, 2023, 10:04pm

UDP is tricky on Fly. In particular, pay attention to the four quirks mentioned in the docs (if you weren’t already):

But before we get started, there are four gotchas you need to know about.

The UDP side of your application needs to bind to the special fly-global-services address. But the TCP side of your application can’t; it should bind to 0.0.0.0.

The UDP side of your application needs to bind to the same port that is used externally. Fly will not rewrite the port; Fly only rewrites the IP address for UDP packets.

We support IPv6 for TCP, but not for UDP.

We swipe a couple dozen bytes from your MTU for UDP, which usually doesn’t matter, but in rare cases might.

tmm1 · January 23, 2023, 10:14pm

Thanks. I did a rollback and its working again. Will try to figure out what changed.

tmm1 · January 23, 2023, 10:32pm

Well, things are working again. I isolated each change and deployed it separately, and eventually deployed all the same changes together and its still working. I’m pretty confident its nothing on my side.

This matches my experience in the past where deploying a new container after a while breaks, and then doing a couple more deploys/restarts fixes things magically. I guess the bug referenced above is still present.

Topic		Replies	Views
Port 53 UDP not working	6	1165	May 16, 2022
Inbound UDP not working	7	833	February 15, 2022
Not receiving UDP traffic on port 53	2	41	February 9, 2025
Running TCP DNS server?	26	4206	February 1, 2022
TCP/UDP port 53 blocked? Trying to setup a DNS server Questions / Help	3	817	October 8, 2022

DNS over TCP works, but UDP doesn't

Related topics