UDP ingress not working at all in ams, lax, nrt, sea, sin, sjc

After a lot of time spend debugging and trying to figure out why my DNS running in sin stopped working a few days ago, I build a small script to move an app to each region, test the UDP ingress and report the results:

region UDP PTR AS
ams :x: worker-pkt-am6-4e11. AS54825 Packet Host, Inc.
cdg :white_check_mark: AS36236 NetActuate, Inc
dfw :white_check_mark: AS30081 CacheNetworks, Inc.
ewr :white_check_mark: worker-pkt-ny5-429d. AS54825 Packet Host, Inc.
fra :white_check_mark: AS36236 NetActuate, Inc
gru :white_check_mark: worker-pkt-sp4-eb5d. AS54825 Packet Host, Inc.
hkg :white_check_mark: worker-pkt-hkg1-21b2. AS54825 Packet Host, Inc.
iad :white_check_mark: AS30081 CacheNetworks, Inc.
lax :x: worker-pkt-la4-b798. AS54825 Packet Host, Inc.
lhr :white_check_mark: AS30081 CacheNetworks, Inc.
maa :white_check_mark: AS36236 NetActuate, Inc
mad :white_check_mark: worker-pkt-md2-c42c. AS54825 Packet Host, Inc.
mia :white_check_mark: AS30081 CacheNetworks, Inc.
nrt :x: worker-pkt-ty11-dde7. AS54825 Packet Host, Inc.
ord :white_check_mark: AS30081 CacheNetworks, Inc.
scl :white_check_mark: AS36236 NetActuate, Inc
sea :x: worker-pkt-se4-adcf. AS54825 Packet Host, Inc.
sin :x: worker-pkt-sg4-a0f0. AS54825 Packet Host, Inc.
sjc :x: worker-pkt-sv15-807e. AS54825 Packet Host, Inc.
syd :white_check_mark: AS30081 CacheNetworks, Inc.
yyz :white_check_mark: unknown.ord.scnet.net. AS30081 CacheNetworks, Inc.

Seems to be broken in 6 regions at the time of writing this :frowning: (all of which are Packet / Equinix Metal).
Pretty disappointing. I hope this can be fixed and maybe added to the status page https://status.flyio.net/ or internal monitoring.

Anyway, either this is only happening on my account or you should avoid using these regions for UDP apps.

Note: Yes, I am using IPv4 and binding the UDP to fly-global-services.

1 Like

Hey, thanks for catching this. It’s not just your account! We’re rolling out a lot of new servers, and a batch of them have an XDP build for an older kernel (I just fixed that LAX host). We’re on it, and we’ll add a check for this condition for all our servers.

(I’ll update the status page as well)

The fix should be rolled out everywhere, but I’m testing individually and knocking them out of the status page update as I go.

Later edit

We’re rolling out a fleetwide version check for this; we have that check already, but it lives in the dependency that got deployed stale, so we’re adding an out-of-band one as well; this shouldn’t ever happen again.

1 Like

Thanks for fixing this so quickly :smiley: