Intermittent `fly scale count` failures

I’m setting up a staging environment and am trying to use GH actions to scale it up/down as required to run functional tests as part of CI/CD.

It seems like fly scale count is failing intermittently, both when scaling to 0 and 1, without returning an error (though the message does say the current number so you can tell whether it worked, or not).

> fly scale count 1 -c fly.staging.toml --verbose
Count changed to 1
> fly scale count 0 -c fly.staging.toml --verbose
Count changed to 1
> fly scale count 0 -c fly.staging.toml --verbose
Count changed to 0
> fly scale count 1 -c fly.staging.toml --verbose
Count changed to 0

Is there a way to make it work/return an error (other than writing a bash script to interpret the message, I guess)?

If it didn’t work we should be returning an error so we’ll fix that. What’s your app name?

It’s regex-help-stg.

Also, if I’m scaling up/down & deploying are there any order requirements?

  1. E.g. can I safely deploy with scale count 0 and only scale it up later?
  2. On the other hand, is it safe to run both scale count 1 and deploy at the same time, or should they be in sequence?

Scale to 0 and back to 1 is kind of buggy in our API/flyctl. Your best bet is flyctl deploy and then flyctl scale count 1. These actions are each a separate Nomad job update under the hood.

1 Like

Since doing deploy → scale to 1, I noticed that flyctl sometimes gets stuck on the Monitoring Deployment stage. Does that sound related? As in, is deployment more likely to be stuck when the app is scaled to 0?

The deployment monitor is getting stuck because there isn’t a deployment created when the count is zero. I think we should either automatically set the count to 1 when deploying if it’s zero or we should exit early. I’ll admit deploying when the count is zero is something we haven’t tested much. We’ll figure out how to get it working in a way you’d expect.

1 Like

For anyone running into this, I ended up “solving” this problem by simply running the command twice in a row :slight_smile: I still sometimes see failures after the first attempt, but everything seems to reliably work the 2nd time around.