Long-time readers will remember infra log, the not-quite-a-blog largely dedicated to incident debriefs. It’s where we would cover the technical details behind what went wrong, and often the wider context behind any given incident.
Long-time readers will also know that the infra log has been dormant for a while now. Mostly, everyone just got so darn busy.
We did miss having a place to talk about incidents, though, so we’re bringing it back! Just with a different format. To spread the writing load more, it’s one post per incident, rather than the weekly editorial of yesteryear. And our goal is to keep it marching at a roughly-steady T+7 days from the incident itself, barring pesky things like weekends and holidays.
I just want to say that mostly the infra-log stopped happening because it was never clear anybody was reading it (all those entries were written by hand) so if this is valuable to y’all, say so.
I’m very happy to see this retrospective log come back. The short-form info in the incident logs didn’t feel the same.
In a video call where we were convincing the powers-that-be we should move away from AWS, I used the existence of Infra Log as a bonus. We have been burned in the past by a lack of transparency, so it was effective as a selling point. Fortunately, no one noticed at the time that May 2025 was the last entry!