App is not completing deploys or propagating scaling changes

Hey everyone,

We have an app that seems to have stopped deploying, we see the version numbers sometimes increment upon cli scale commands or deploys, but we for instance are not seeing new deploys deploy once completed, nor are we seeing for instance a second VM spin up if we scale min=2 from min=1 or min=2 to min=3.

There seems to always have one last “pending” change in the activity log as well in the dashboard.

At one point I had 2 VMs running but one version was very behind, (had a much lower version number), and the only way I could get that VM to be removed was by manually stopping via the cli that vm by its ID.

My guess is that the app is stuck in some kind of weird state.

App
  Name     = better-cart-totan-prod
  Owner    = better-cart
  Version  = 103
  Status   = running
  Hostname = better-cart-totan-prod.fly.dev

Instances
ID              PROCESS VERSION REGION  DESIRED STATUS  HEALTH CHECKS           RESTARTS        CREATED
d8007c80        app     102     ord     run     running 1 total, 1 passing      1               2022-02-18T18:43:59Z

fly status all shows the old manually stopped VM as well if this helps:

Instances
ID              PROCESS VERSION REGION  DESIRED STATUS          HEALTH CHECKS           RESTARTS        CREATED
d8007c80        app     102     ord     run     running         1 total, 1 passing      1               2022-02-18T18:43:59Z
215892cf        app     93      iad     stop    complete        1 total, 1 passing      1               2022-02-16T04:18:20Z
VM Resources for better-cart-totan-prod
        VM Size: shared-cpu-1x
      VM Memory: 512 MB
          Count: 3
 Max Per Region: Not set

As you can see, the app has been running for some time now on version 102 but should have deployed version 103 8 hours ago. You will also notice that there should be at least 3 VMs running, but only one is running.

Listing releases will always include the latest release as in_progress - no matter the change that happen to trigger a release (scale, deploy, etc)

Maybe related: App is stuck in pending in `sin` region - #3 by jerome

Thanks in advance!

What regions do you have set? I think that Max Per Region output might be incorrect, this looks like what happens when there are few regions and max-per-region=1.

Gotcha, we only have ORD as an available region to prevent PG latency at the moment.

Region Pool:
ord
Backup Region:

Try running:

fly scale count 3 --max-per-region=0

And see if that helps?

1 Like

Ah! We are finally seeing changes propagate instantly!

Is there a way to have 2 VMs running in ORD but have for instance have IAD as a backup but only use IAD if ORD is down?

There isn’t, it’s hard to control things at that level with Nomad. We’ll have better options in several months.

1 Like

Gotcha, thanks for the info!

I noticed that the command to solve our issue disabled autoscale, is there a way to still get the min/max autoscale feature and solve our issue? It seems as if when I changed back to autoscale, now we are back in that bad state.

It’s likely that your app is in a weird state and we’ll need to fix it to re-enable autoscaling. Give us a few hours and we’ll let you know.

1 Like

Perfect, you guys rock.

Thanks again!