UDP connectivity failing

We’ve got two different deployments on two different any cast IPs that both had UDP start failing about 20 minutes ago. Anyone else / any info from Fly?

Thanks

We are seeing the same thing, as of about 20 minutes ago UDP is no longer working. Keeping an eye on the status page but no posts.

We just posted an issue related to DNS resolution, which may tangentially be related to this. We’ll post more info as it comes.

Are you seeing issues with port 53, or other UDP ports?

For me it’s port 53

Port 53 for us as well.

I see you have updated it with " Monitoring - Rollback has restored access to DNS services."

But we are not able to connect to our instances on UDP 53

We’re aware that something is going on; a network rule change got pushed, which initially broke internal DNS, which is now restored. We’re still investigating UDP Anycast. We’re on it.

1 Like

Same here, UDP 53

Hi team, any ETA? We’ve got loads of clients asking for an update. Thanks so much.

Sorry about this. We’re still investigating the problem.

We’ve added a new status page issue for UDP problems, we don’t have any updates yet but we’re working on it: Fly.io Status - Anycast UDP outage

Just a progress note: there are like 6 of us working on this right now. We’ve got a workaround that we’re rolling out — it involves rebooting our edge hosts (they’re stateless, grouped, and anycasted so we can do rolling restarts without impacting traffic). This is taking us time.

We’ve got the problem isolated on a single edge host, paired with a working edge host, and are still trying to figure out what byzantine Linux IP stack thing we managed to break. As soon as I find anything, I’ll let you all know.

2 Likes

My new update:

Every edge host we have EXCEPT the one I was jealously guarding to try to investigate has now been rebooted.

I am going to relinquish that edge host — we should have restored UDP for pretty much everyone already, but it’ll be every host in just a minute.

All our off-net DNS monitoring hosts (syd, ord, fra, cdg, lhr, a bunch more) are showing UDP Anycast DNS connectivity now.

All y’all that had UDP services impacted by this, hit me at thomas@fly.io. We’re obviously not happy that this happened (me least of all, because I still don’t know precisely what happened — investigation ongoing) and we’ll do something to get you back for dealing with this.

I’ll write SOMETHING about what we were trying to do with this change, and what we learned in the process, by the end of the week (i hope). I’m hopeful I’ll actually know exactly what broke, and I’m certain that we’ll be able to talk about things we’ll have done to reduce the likelihood of this happening again.

5 Likes

Thanks @thomas, not to worry just commenting here to say thank you for the open and quick communication which makes a lot of difference, appreciate it thanks team.

I emailed you about this but seemingly aren’t receiving a reply, also sent a chaser. Are you available to read our email?

That’s weird, I didn’t see it. You sent to thomas@fly.io? I’ll try to track it down from your email address.

I am definitely available to read my @fly.io email — I don’t get much email there. :slight_smile:

Found it, will reply this morning.

Curious: Any update on this writeup? Thanks.