Custom deployment strategies

We are currently developing a Phoenix application on fly. It’s a multiplayer backend that stores state in memory and dispatches updates to clients through WebSockets, clustered across the globe to ensure optimal ping response times.

The in-memory state is persisted to S3. We use the bluegreen deployment strategy, and it’s important to save all the state to S3 before routing traffic to the new cluster, so we don’t lose anything. The process we came up with goes roughly like this:

  1. release_command adds an entry in DB to signal the old cluster a deployment is happening
  2. new cluster boots but its /health responds with a 503 as long as the DB entry is present, so traffic does not get routed to it yet
  3. old cluster sees the DB entry and starts a “pre shutdown” sequence
    a. close all channels
    b. save all projects to S3
    c. remove DB entry
  4. new cluster’s /health responds 200, traffic is routed to the new cluster and clients start reconnecting
  5. old cluster nodes get a SIGTERM and finish to shutdown

This is a bit too complex for our taste, we would like to get rid of the pre-shutdown sequence and DB state. Our ideal deployment would go like this:

  1. boot new cluster
  2. (optional) run some quick integration tests on the new cluster via a private tunnel
  3. gracefully shutdown the old cluster with SIGTERM
  4. route traffic to the new cluster

Is it something that could be achieved with the current APIs? If not, we would be happy to start a discussion around this subject :slight_smile:

1 Like

Have you considered storing up data in Fly-managed Redis instead to keep the clusters in-sync? Quite expensive and still in preview, but using it might simplify the pre/post deploy ceremonies, otherwise.

Alternatively, you can consider using disks that persist across deploys (though, be wary of zombie disks). Could even run SeaweedFS atop it, if you’re adventurous.

The problem with failing the health check (steps 2 to 4) for longer time is, Fly might rollback the deployment (which is another scenario the app would have to handle).

The issue is more about being able to customize the bluegreen strategy than which tech we use to workaround the lack of cutomizability :slight_smile:

1 Like

Gotcha, but I’d avoid it if I were you (there is a bunch that can go wrong, as you know).

Is it something that could be achieved with the current APIs?

Anyways, for Machine apps, both release_command (code) and rolling strat (there’s no blue-green) are driven client-side by flyctl (code). So, if you have a new strat that you want to impl or customize the existing one, it is pretty straight-forward to do so (don’t quote me on it, I’ve never had to do it ;)).

For regular apps, the deployments are handled server-side by Fly, and so short of them implementing a new strat or modifying an existing one, I don’t see how it would be a worthwhile endeavour…

I’m also looking into a custom deploy strategy, which has 2 goals

  1. more extensive tests in real life infra, including 6pn
  2. keeping old scaled to 0 deploys around for “skew protection

Let’s say we have app with name foo, which has custom domains, certs, dns and everything set up

Ideally, I think, it’d be amazing to be able to deploy to a randomly generated app name bar, do some testing etc, then basically move all the traffic from foo over to bar while having some internal mapping of git versions to deployment names/ids so that if bar receives a request which originated from a foo front-end it can replay the request to foo via the replay header, which should boot up the possibly already scaled to 0 app foo. So I think this might require something like renaming an existing and running app which doesn’t seem to be possible from what I read

As the app already needs inter instance/app communication to forward websocket traffic etc it’d be nice to do the same in this scenario, though from what I gathered so far this might not really be possible at the moment? Tho pleease correct me if I’m wrong :crossed_fingers:

The alternative I can think of which I probably like to avoid would be a separate api-gateway app which needs less (potentially breaking) deploys and just orchestrates where to proxy the incoming request to, which would make all the domain & dns stuff be stable and the name of the actual app wouldn’t matter to the outside world, this might also be helpful with the custom user domains part

Another one might be programmatically changing DNS records to move traffic from foo over to bar

Also note we plan on supporting custom user domains soonish which I don’t have the definitive setup we’re going for fleshed out yet, might be purely on fly or via a cdn in front, was looking into bunny or potentially cloudfront as alternative, cloudflare seems to be a bit pricey for this use-case. The custom domain thing makes me think that the last option, updating dns records might be a little less convenient overall and potentially everything leads to a separate api-gateway app in front :thinking: