When a bluegreen deploy fails I end up with extra machines

I’ve got an app with 1+N web processes and 1 worker process. I don’t want web downtime during deploys but can tolerate a minute or two of worker downtime.

Currently I’ve got a web and a worker process group and I am using bluegreen deploys. I have a health check on my web process group but none on my worker process group.

If the healthcheck fails during deploy, the new web machines get shut down. However the old worker machine stays running and then I have two worker machines. This seems like a bug.

What’s the best way to architect or configure this app?

Hi @reconbot2

We separated our worker into a separate app. It allows for better control.
I don’t think the process group feature really adds any benefit apart from reducing the number of apps and reusing docker images. The tradeoff isn’t really worth it.

I really like the atomic unit of deploy and rollback, So my services can’t be at a different version from each other. However, if fly doesn’t offer that I guess I’ll have to build it myself?

Depends on your needs but I would say that your workers should be capable to talking to different versions of your app and vice versa.

This is especially helpful as you grow and deploys take longer due to more servers. You don’t want a new version of the app to brake simply because it happens to be talking to one of the old worker machines, and you don’t want one of your workers to break because a new app machine has changed something that the old worker can’t handle.

This is definitely not the intended behavior.

What happens if you do a rolling or canary deploy and it fails?

1 Like

Canary seems to do what I’d expect. It launches a web process and when that doesn’t come up it gets torn down. However when I do a second deployment, I end up with two workers running instead of one?

Yes the worker and web should be tolerant of mixed deployments. And If I separate out the deployments that I probably also should pull out database migrations from the release commands and then orchestrate the 3 steps my own release process. It’s a new product and I don’t want to be investing in this at the moment.

And I guess to file another bug, every successful canary release I do seems to add a worker machine :joy:

I opened a support issue with the app’s name if you want to grab some details.

Is it adding a worker machine that runs, or a standby worker machine that’s stopped?

Canary adds a running worker machine with every deploy.

@reconbot2 This should be fixed in Release v0.2.97 · superfly/flyctl · GitHub
The rollback should leave you with the proper count of machines now. LMK if it doesnt!

2 Likes

Bluegreen is doing what I’d expect! (v0.2.99) Thank you.