A locking mystery on the following Sunday…
This wasn’t a reprise of the classic 0xffffffffffffffff but maybe something from the depths of SQLite instead.
A bug in the Linux kernel, which is an unusual conclusion for an Infra Log entry…
It was affecting egress IPs and the Fly Proxy (e.g., its load balancing), among other things.
The anticipated write-up of the widely noticed certificates incident on the subsequent Friday…
Moving to a different storage arrangement was confirmed as being the plan, albeit in the “longer term”.
Further WireGuard wobbles, to start off the new week…
This time it was a userland bug, however, not in the kernel.
Aside: There was also a status-page-only The status-page incident in Singapore on that day had the same underlying cause. (See @PeterCxy’s comment below for more details.)
Small side note: this was actually the same incident as the one in infra-log. The increased latency was caused by… duplicate wg addresses trashing one of our edges in sin rendering it mostly useless for a while 
Ah… That does make sense. (And sin was specifically used as an example in the Log entry, too.)
Thanks for the correction!
500s on the dashboard and with GraphQL, as another Thursday rolled around…
It doesn’t sound like it was MPG that was overloaded, but rather an internal database of Fly.io’s own.
Addendum: The second incident on that day (April 23) was written up a bit later, as can be seen below.
On the following Monday, the Postgres storm clouds did move over MPG…
That first one apparently caused 6-hour outages for certain operations.
As the revived Infra Log’s second month drew to a close, several users reported odd breakage in deploys…
Not only were extra Machines created, but existing ones weren’t updated to the new image.
This persisted slightly, half an hour, or so, into the following day (April 29).
A small graphical overview of the previous month, now that it’s complete in the Log…
| April 2026
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| SYD×2, GraphQL, dashboard, metrics, ORD, NRT
|
|
|
|
|
|
|
| ORD, SYD, WireGuard, certs
|
|
|
|
|
|
|
| WireGuard, SIN, dashboard, GraphQL, IAD
|
|
|
|
|
|
|
| deploys, MPG
|
See the earlier March grid for a description of the annotations.
The first four days of April (corresponding to the top row) were clear of incidents, which was certainly a nice way to start things off…
The wide red mark on April 17 was the Vault certificates store (again); this is one of the few remaining services from the era of using Raft-based clusters for global metadata/configuration (as I understand it). In the longer term, there are plans for replacing it, and a note in the companion forum thread mentioned the decentralized PetSem as the probable substitute.
The wide red stroke on April 28, eleven days later, was a global failure of deploys, due to the Machines API erroneously returning an empty list when asked about existing Machines. This event slightly straddled midnight (00:00 UTC), which is why there are two bars, two outgoing links, etc.
Aside: Four incidents didn’t make it into the Infra Log, per se. (Possibly just because there was no further commentary that could be added.) In those spots, the cell in the table links either to the real-time status page’s archives or to a post in the present forum thread, depending on what else was in the air that day.