A couple notes about the ongoing network instability over the past day or so

:wave: Hi folks. I just wanted to drop a note here and let everyone know a bit more about what’s causing the network instability you might be seeing across different Fly.io regions right now. Of course, we’d love if you can tune into the statuspage and follow along with the LAX/ORD/ATL Network Issues incident also for real time updates, but it’s tough to tell a more complete story using one line updates as things happens…

Yesterday we were alerted to high packet loss in the LAX region and shortly afterwards some of our BGP peering in that region began flapping. We worked with our upstream providers to mitigate what clearly looked like a DDoS attack and this helped! But as DDoS attacks go, the attack shifted a bit and we had to shift as well. Every time we made a change there was a few minutes where our network was less than stable. Generally, we’re really good at blocking these sorts of things. They happen all the time, and we hope you never even notice that some annoying botnet is trying to poke our global anycast network with a pointy stick.

We’re still adapting how we mitigate this attack without disrupting legitimate traffic. Here is specifically what we’re doing so that you have a better idea how these things go in practice. Our upstreams can filter traffic for us, but it’s a pretty blunt instrument. Basically they can do things like block IP ranges or ASNs or drag traffic across weird paths to slow things down or redirect them to different regional edges. This is usually good enough, however it has the side effect of also breaking legitimate traffic that just happens to be caught up in those broad filters. So, sometimes we need to take a more surgical approach. Right now we’re building out some rulesets at our edges that let us more accurately fingerprint exactly the aggravating DDoS traffic (with some secret sauce that let’s us update fingerprinting quickly) so that we can shed all of the bad traffic or otherwise make the DDoS increasingly more expensive for the attacker.

This is probably not a super satisfying update, but I hope it at least offers some clarity.

17 Likes

From General to Fresh Produce

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.