Just a progress note: there are like 6 of us working on this right now. We’ve got a workaround that we’re rolling out — it involves rebooting our edge hosts (they’re stateless, grouped, and anycasted so we can do rolling restarts without impacting traffic). This is taking us time.
We’ve got the problem isolated on a single edge host, paired with a working edge host, and are still trying to figure out what byzantine Linux IP stack thing we managed to break. As soon as I find anything, I’ll let you all know.