I’m trying to upgrade my Headscale on Fly.io application to run Headscale’s embedded DERP server (PR). This requires a UDP connection on port 3478.
Since I couldn’t get it to work so far, I thought about going back to the basics and starting with a simplistic example that should work. So I created a new Fly app that just runs Alpine, listens on UDP port 5000, uses a dedicated IPv4 and I just use fly ssh console to run nc, and I still can’t get it to work.
❯ echo foo | nc -vvu <FLY_APP_IPv4> 5000
Connection to <FLY_APP_IPv4> 5000 port [udp/*] succeeded!
But nothing is actually received on the Fly side. If I change both fly-global-services and <FLY_APP_IPV4> to localhost and run the client nc command in another Fly ssh console session, it works.
What am I doing wrong, or how can I debug what’s wrong? Thanks!
Yes I can see logs, though my app doesn’t currently spit out its own logs due to sleep infinity being the entrypoint.
2024-10-17T20:54:46Z runner[9080014f550938] fra [info]Pulling container image docker-hub-mirror.fly.io/library/alpine:latest
2024-10-17T20:54:47Z runner[9080014f550938] fra [info]Successfully prepared image docker-hub-mirror.fly.io/library/alpine:latest (708.319277ms)
2024-10-17T20:54:48Z runner[9080014f550938] fra [info]Configuring firecracker
2024-10-17T20:54:48Z app[9080014f550938] fra [info]2024-10-17T20:54:48.801806339 [01JAE45X0WXCD7ANW7T96QW9B1:main] Running Firecracker v1.7.0
2024-10-17T20:54:49Z app[9080014f550938] fra [info][ 0.268591] PCI: Fatal: No config space access function found
2024-10-17T20:54:49Z app[9080014f550938] fra [info] INFO Starting init (commit: 04656915)...
2024-10-17T20:54:49Z app[9080014f550938] fra [info] INFO Preparing to run: `sleep infinity` as root
2024-10-17T20:54:49Z app[9080014f550938] fra [info] INFO [fly api proxy] listening at /.fly/api
2024-10-17T20:54:49Z runner[9080014f550938] fra [info]Machine created and started in 3.021s
2024-10-17T20:54:49Z app[9080014f550938] fra [info]2024/10/17 20:54:49 INFO SSH listening listen_address=[fdaa:a:6ac1:a7b:8a:ce2e:6551:2]:22 dns_server=[fdaa::3]:53
2024-10-17T20:55:00Z app[9080014f550938] fra [info]2024/10/17 20:55:00 INFO New SSH session email=my@email verified=true
I’ve retried the same steps today and it still won’t work.
Hey there. I wrote our UDP support. It doesn’t depend on fly-proxy (the glitch earlier today was with Flycast, which is a proxy feature) and it doesn’t use our new dedicated IP addresses for outbound connections. UDP is routed entirely in-kernel in eBPF; that code hasn’t changed meaningfully in years.
I’ll look into this, but I just wanted real quick to make clear that nothing that has happened today (or in a long time, really) should be impacting UDP forwarding. Not saying something couldn’t be wrong!
Are you sure this is the right nc command line? Your app needs to be listening on the fly-global-services app, not on any other address — the Linux kernel will respond with the wrong IPv4 address (and your client won’t see the response) unless you’re super explicit about it.
You can install tcpdump and do a tcpdump -nX -i eth0 udp to see if you’re getting incoming UDP packets.
I spent 30 minutes looking into this and freaking out because my own copy of this app wasn’t working with UDP either; I verified our BPF maps on the worker server my Fly Machine was running on, did a 5 minute detour into how fly-proxy handles the flyio-debug:doit header when no TCP servers are lit up for an app (spoiler: it doesn’t work), found an AT&T Fiber Chicago routing issue (my packets are all going to Newark because AT&T and Comcast have colluded to make my life miserable), which Peter is looking into now (sorry Peter), isolated the weird EWR edge my UDP packets are hitting, did packet-level diagnostics there, couldn’t see my UDP packets, freaked out some more, then realized I was missing the -u in my client nc connection.
Once I added it, everything worked fine.
Again: not ruling out anything broken in your app. All I ran on the serverside was nc -l -u -p 5000 -s 172.19.5.195. Note that I looked up fly-global-services and used the IP directly, because I don’t trust nc, which was written in the phlogiston era of network programming.