How's The Recovery Work Going? (Spoiler: Really well)

billy · August 2, 2024, 6:46pm

Hey everyone,
A little while back, I talked about improving deploy recovery in flyctl. To summarize that post, we worked on improving the ability for flyctl to handle intermittent failures during deploy orchestration. This post is more of an update into how that’s going.

The numbers Mason!

Thanks to the amazing work from the folks at the deployments team, we have pretty good insights into how successful deploys are (and why they fail).

This is the “platform failure rate” from the past two weeks. These are cases are deploys fail because some part of the platform failed. As you can see, the error rate dropped from around 4.1% two weeks ago (before the recovery changes were merged in), to around 2.8% as of writing! The recovery changes were merged in about a week and a half ago, but only 5% of users were using recovery by default. As you can see on the graph, we brought that number up to 100% slowly, and the results speak for themselves!

Where are we going from here?

There’s still a good amount of work to be done to get this number even lower. Remote builder issues are now a large part of why deploys fail. We’re doing things like introducing support for Depot to help to mitigate these issues, but improving our current remote builder support is still a priority. There’s also platform errors that we could recover from that we currently aren’t. We’re actively looking into those now, and we should see further improvements in the future

Why am I saying all this?

Being transparent about this kind of work is important! Saying that we’re working on improving the reliability of the platform is important, but so is showing it!

Topic		Replies	Views
Any current issue with deploys?	3	401	January 22, 2021
Something went wrong? Questions / Help	42	1500	September 22, 2022
Reliability: It's Not Great	53	78877	April 15, 2024
Blue green deployment stuck	6	498	December 6, 2022
Fly.io machine is down again - another incident? builders	15	360	November 5, 2024

How's The Recovery Work Going? (Spoiler: Really well)

The numbers Mason!

Where are we going from here?

Why am I saying all this?

Related topics