This time it was a userland bug, however, not in the kernel.
Aside: There was also a status-page-only The status-page incident in Singapore on that day had the same underlying cause. (See @PeterCxy’s comment below for more details.)
Small side note: this was actually the same incident as the one in infra-log. The increased latency was caused by… duplicate wg addresses trashing one of our edges in sin rendering it mostly useless for a while
A small graphical overview of the previous month, now that it’s complete in the Log…
April 2026
SYD×2, GraphQL, dashboard, metrics, ORD, NRT
ORD, SYD, WireGuard, certs
WireGuard, SIN, dashboard, GraphQL, IAD
deploys, MPG
See the earlier March grid for a description of the annotations.
The first four days of April (corresponding to the top row) were clear of incidents, which was certainly a nice way to start things off…
The wide red mark on April 17 was the Vault certificates store (again); this is one of the few remaining services from the era of using Raft-based clusters for global metadata/configuration (as I understand it). In the longer term, there are plans for replacing it, and a note in the companion forum thread mentioned the decentralized PetSem as the probable substitute.
The wide red stroke on April 28, eleven days later, was a global failure of deploys, due to the Machines API erroneously returning an empty list when asked about existing Machines. This event slightly straddled midnight (00:00 UTC), which is why there are two bars, two outgoing links, etc.
Aside: Four incidents didn’t make it into the Infra Log, per se. (Possibly just because there was no further commentary that could be added.) In those spots, the cell in the table links either to the real-time status page’s archives or to a post in the present forum thread, depending on what else was in the air that day.
That first one briefly affected attempts to mutate secrets, but did not stop reads (which are distributed).
Addenda: There was also a forum-only incident with FRA networking on the bottom row’s day (May 6). The recent Fresh Produce on NATing outgoing IPv6 may be the de facto postmortem for that one.
In a similar vein, the following date’s (May 7) real-time status page reported relatively brief incidents in BOM and SJC, compiled here for ease of reference in the next summary grid.
Most people don’t have Cloud Hypervisor underlying their own Machines (on Fly.io); that’s only needed for GPUs and Upstash’s backstage servers. Still, the popularity of the Upstash Redis extension resulted in considerable notice in the forum…
Aside: The real-time status page also mentioned a glitch in the Grafana logs on the top row’s day (May 11) as well as a reoccurrence of Redis on May 12.
Aside2: The Oban incident may have extended several hours into the following day (May 13).