how do fly.io deploys work?

You’re probably hitting a slow service propagation issue we’ve been fighting. It will hopefully be fixed for good in a few weeks, but the tldr is we’re probably causing the downtime when you deploy an app with a single VM. You can reduce the impact of this by running fly scale count 2 or more.

When you deploy we:

  1. Start a canary VM (if you’re using a volume, we skip this step)
  2. Remove an old VM from service discovery
  3. Wait 30 seconds
  4. Stop old VM
  5. Start new VM, add it to service discovery

fly-proxy relies on service discovery to know where to send requests. The problem is, it frequently takes 60-120s for fly-proxy to detect service discovery changes. So there’s a window when the old VM stops where fly-proxy doesn’t know about the new VM.

Running multiple VMs mitigates this by chance, it buys fly-proxy more time to detect the new VMs.

1 Like