The Infrastructure Log is a good way to learn more about the details and root causes of past incidents—as well as to hear about many upcoming changes in advance. It’s a low-key and not super-prominent publication, though, so many new users overlook it.
https://fly.io/infra-log/2025-01-25/
The medium-intense box in the middle (vertically) on Thursday was the much-remarked global(?) deployment outage, stemming from a database problem at Depot. (They are the third party that handles the default builders now.)
(Note that the subsequent aftershock on Jan 27 won’t be covered until next week, due to the Log’s intrinsically retrospective and deliberate pace.)
The lighter box above it was a regional logging glitch, in Sydney and San Francisco.
Definitely read the link at the top for the official details.
Behind the scenes, efforts included the following, outlined here non-exhaustively…
Measures being taken to prevent future problems
- Decentralization...
- Of the Fly Proxy... *
- particularly with respect to raw TCP connections.
- Of the Fly Proxy... *
- Preemptive—as opposed to reactive—trawls throughout the infrastructure...
- Improved tracing of Machine migrations.
- Backups of Kubernetes metadata.
- Unified database of hardware repairs in progress.
- Deliberate reboots to avoid an AMD firmware bug. *
Efficiency and Capacity Improvements
- Increased I/O performance, particularly by reducing contention between compute, network traffic, and filesystem access.
New features
- 6PN address stability after Machine migrations, particularly for Fly Postgres (the self-managed one). *
*Work ongoing from previously.
Mentioned elsewhere
- An update to Fly Postgres clustering to make migrations more stable, by removing the reliance on literal 6PNs, was announced in the forum. (Presumably the new feature mentioned in the section above is a fallback for un-upgraded images.)
-
The newer, fully managed Postgres, under the name of Fly MPG, was predicted by the January newsletter to be "ready to play with within the next month-ish".
-
A partial explanation of the mysterious "Free Trial Activated", which provoked some concern in the forum, appeared there as well. (Apparently, the idea is simply to allow people at least 7 days of ability to test out the service without providing a credit card.)
Caveat: The above are just my own interpretations and paraphrases, as a fellow user.
Aside: It was implied in the Log that there was a minimum level of interestingness that entries there had to meet, but, in my opinion, that shouldn’t be true. There’s a time to be attention-getting, and a time to be iridescent, , but also a time to just be the structure that everyone else relies on…