Automatically Retry Deploy Orchestration With --deploy-retries

billy · July 22, 2024, 9:01pm

Hey everyone,
As of v0.2.94 of flyctl, we now support retrying deploy orchestration! For those of you who want to try this feature right away, you can by setting --deploy-retries=5 (or whatever other number!)

What is deployment orchestration?
Deploying your app is pretty complex. There’s a lot of moving parts that go behind the scenes, and any of those moving parts can fail or a variety of interesting reasons. A few weeks ago, I was mostly working on improving the remote builder experience (I actually plan on continuing to work on remote builders, but no spoilers for my next community post ). Now, I’m working on deployment orchestration.

Basically, this part of deployments is when flyctl is trying to update all your machines, make sure health checks are passing, and make sure that traffic is flowing to the right machines. When deployment orchestration fails, one of two things can happen:

in the best case, your app just remains in the state it was before the deploy failed
in the worst case, your app is in a gross “in-between state” where some machines are using your new image, and some are using your old one

The correct solution here is normally to just try deploying again, which will usually succeed if you just experienced a small platform hiccup. But why have to manually retry deploys at all?

What deploy retries help with
If you enable deploy retries, you should notice less issues with acquiring leases, updating machines, and a few more steps during deploys. Generally speaking, small intermittent platform issues should be less noticeable in the future.

What they don’t help with
The two big areas where deploy retries won’t help are remote builder issues, and what I call ‘application issues’. For example, if for whatever reason flyctl is having trouble communicating with your remote builder, then these new changes won’t help (though we’re working on that!).

Alternatively, if your app is failing health checks because your application code has a bug, these changes won’t help either. Using something like canary or bluegreen deploys could help in these cases, though.

This also won’t help when there’s major platform issues. For example, in the terrible scenario that the machines API is completely dead, it doesn’t matter how many times we try to communicate with it.

Why is the default set to ‘auto’?
The final goal is to enable this feature for all users by default. However, these changes required a large amount of restructuring of how deploys work internally. To make sure that we don’t introduce some horrible bug on accident, we’re slowly rolling out this feature to all users over the course of a week. As of writing, around 5% of you should be getting automatic retries by default. If you want to opt-in early, you can add the --deploy-retries=5 to fly deploy. If you want to opt out, just set --deploy-retries=0.

Topic		Replies	Views
Improve remote builder issues with --http-failover! Fresh Produce flyctl	4	294	May 8, 2024
App crashed, won't restart or deploy (production app) Questions / Help	12	1305	September 24, 2021
How's The Recovery Work Going? (Spoiler: Really well) Fresh Produce	0	291	August 2, 2024
Deploys are still often failing	17	1685	October 6, 2022
Can't bring app back from the dead	4	1150	January 27, 2021

Automatically Retry Deploy Orchestration With --deploy-retries

Related topics