Error: could not find an instance to route to after deployment

I’m at a loss. I have a “long” running app (couple of months). I deployed a new version and now got said message: Error: could not find an instance to route to.

I spun up a new container with the same fly.toml & Dockerfile and it just starts normally. I re-deploy the new container - it works. 3rd time: Same error. Could not find an instance to route to. During deployment that new container says “Failed due to unhealthy allocations” - the old one is just stuck.

Both containers try to leverage Tailscale as explained here: Tailscale on Fly.io · Tailscale (doing pretty much the same thing as explained in the article). Are there some changes to user mode networking? Because that is where the containers seem to start locking up.

I’m seeing the same issue. I have an app deployed on fra (main) and gru and this night, I noticed that the fra instance died and was replaced by an instance on gru, which is causing this error

@luizkowalski ➜ /workspaces/sumiu (main) $ fly status -a sumiu-web
App
  Name     = sumiu-web          
  Owner    = personal           
  Version  = 371                
  Status   = running            
  Hostname = sumiu-web.fly.dev  
  Platform = nomad              

Instances
ID              PROCESS VERSION REGION  DESIRED STATUS  HEALTH CHECKS           RESTARTS        CREATED    
61522164        app     371     gru     run     running 1 total, 1 passing      0               10h34m ago
4129a89b        app     371     gru     run     running 1 total, 1 passing      0               12h54m ago

I noticed that this started to happen when I enabled autoscaling

@luizkowalski ➜ /workspaces/sumiu (main) $ fly scale show -a sumiu-web
VM Resources for sumiu-web
          Count: 2
 Max Per Region: agent=0 app=0 web=1 worker=0 

Process group agent
        VM Size: shared-cpu-1x
      VM Memory: 512 MB
 Max Per Region: 0

Process group app
        VM Size: shared-cpu-1x
      VM Memory: 512 MB
 Max Per Region: 0

Process group web
        VM Size: shared-cpu-1x
      VM Memory: 256 MB
 Max Per Region: 1

Process group worker
        VM Size: shared-cpu-1x
      VM Memory: 256 MB

Our fra region is experiencing high load earlier today resulting in slower deployments (sometimes 10-15 minutes or longer). The fly deploy command can timeout in those cases. When that happens the deployment will still happen, and once it does fly status will show the instances as running.

More info on our status page: Fly.io Status - Deployments may take 10-15min in fra region

This is indeed the fra region for me as well. Will try later again.

I do see the instance as running, but with a critical health check and " Error: could not find an instance to route to " in the log.

I deployed once more now and still get the same errors in that app. Same fly.toml (and Dockerfile) with a new app and a snapshot of the attached volume works.