I’m running a DNS name server (port 53) on IPv4. All of the sudden, it stopped working, even though it was working perfectly fine before. It seems to still be listening on
fly-global-services, but the Anycast IPs don’t let any UDP through any more. fly.toml wasn’t touched since it stopped working suddenly.
We’re happy to look into this – would you mind sharing the output of
fly ips list and
fly status --all ?
For reference, how did you detect a lack of UDP connectivity to your nameserver app, and approximately when did it start? Do you know if the issue is specific to certain regions?
We just fixed an issue with a host in Frankfurt which could have caused problems with UDP connectivity to your app.
Hopefully this means that your app is once again working normally, but if not, please let us know (along with the information we asked for in my previous post, if possible).
Hi, is Fly prioritising improving MTTR (mean time to recovery) of such host-specific failures? Either through automated detection and remedy / notification or fixing root causes?
I ask because when something’s not working, my first instinct is to look at our own setup and code, which may be time-consuming if the issue was down to a singular host causing trouble (hard to debug or reason about from within the VM sandbox). If the forums are the only way to get such issues notified and fixed, that’s alright too, but the turnaround times can get out of hand in such cases. Thanks.
Most of the time if we’re being told of an infrastructure problem we aren’t catching automatically: we add a check to alert us next time it happens.
We don’t always have that reflex, but we eventually will. This is an ongoing effort to get better at support and operations!