A new deployment strategy beyond bluegreen

I use bluegreen deployment strategy to deploy my app. It works as intended but I really have the need of a slightly different strategy that serve well especially in the context of TCP connections. The idea is the following:

  • green machines are created
  • green machine health checks are awaited
  • blue machines are kept running until they have any active connection (can also be based on soft_limit)
  • blue machines do not accept any new connection, only green machines (from the moment the are ready)
  • blue machines are destroyed only when there’s no more active connections to them

In essence is very similar to bluegreen but less destructive. This way we need less logic in our app to handle deployment because there’s always a version of the app running as soon as it is used.

What do you think?

It would also help in case of multiple services in a monorepo, because a consistent version of them all is running at the same time

This sounds like a very interesting use case! I have multiple angles on how to answer that

Prior context:

  • cordon means "make this machine not routable for folks anymore, just keep whoever is still connected to it.
  • cordoned machines wont get new connections BUT will keep existing ones alive.

Option A) Bluegreen already does that

This scenario is something bluegreen already seems to solve, you might just need to tweak a couple things.

When green machines are started and have passing health checks we uncordon them (so they are reachable via proxy) and then we cordon the blue machines.

Bluegreen waits for blue machines to stop after it sends SIGINT (can be configured, see fly deploy --help and look for --singal). If you want to change how long to wait for blue machines to stop look for --wait-timeout.

If your app can gracefully terminate after all connections are closed then you’re golden! No new connections will be handled after

Note: check if your CI does not have a maximum timeout that would suddenly stop everything!

Option B) If you still want a custom workflow

Assuming increasing the timeouts is not the fix for your use case.

What fly deploy currently does is essentially calls to machines API to create/update machines.

May I suggest that you orchestrate this custom deployment logic either via machines API via your framework of choice or just by using flyctl commands?

Here’s how I’d do it myself (disclaimer: I used an agent to generate these commands and did not try it myself, I recommend trying in a test app)

# Step 1: Build only, capture image ref
fly deploy --build-only --push 2>&1 | grep "^image:" 
# → image: registry.fly.io/my-app:deployment-01234567

# Step 2: List existing machine IDs
fly machine list -q
# → <id1> <id2> <id3>

# Step 3: Deploy new machines, excluding all existing ones
# --ha=false prevents spawning extra standby machines per group
fly deploy --image registry.fly.io/my-app:deployment-01234567 \
  --exclude-machines <id1>,<id2>,<id3> \
  --ha=false

# Step 4: (after verifying new machines are healthy) Cordon the old ones
fly machine cordon <id1> <id2> <id3>

The decision on how to make them get destroyed when they’re not serving traffic anymore. One trick you can use for each machine that needs to be destroyed:

fly machine update <machine_id> \
  --autostop stop \
  --machine-config '{"auto_destroy": true}'

Let the proxy stop it after its idle and the machine also destroys itself instead of staying there forever :slight_smile:

I want to reiterate that I asked an agent to generate those commands as I did not have time to try it myself! (But Im a human and I read your question and suggested things according to how I’d have done myself)

C) Make an agent orchestrate your deployment

I still think that Option A is the way to go but…

flyctl launch is just a fancy open source client for https://api.machines.dev/

Copy your question and my reply into an agent and ask it to generate a custom shell script to orchestrate your deployments with all your details! Im not joking, it works surprisingly well

Here’s flyctl source: GitHub - superfly/flyctl: Command line tools for fly.io services · GitHub


Let me know if this helps!