Fly Releasing non-stop

Hi! I’m running a application on fly and I’m making a few changes, however in my Activity Page Fly.io is releasing about every minute, I’m now on v959 (I think I was on v50 or something)?. Seems like a issue on your side, I’m just reporting as it seems it has stopped.

Thanks!

1 Like

Every time a scaling event happens the version number will increment. This includes autoscaling and region switches.

1 Like

Hi dan,

Yep, I’m aware. However for some reason it’s changing by itself:

But using flyctl releases show the correct last version:
v74 complete Release [emailredacted] 21m24s ago

So yeah, not sure what’s going on. But it seems somehow fly broke.

Here’s my Monitoring screen:

However, all health checks are passing, and I have only 1 application configured.
I was using GRU last week but changed to SEA, for some reason it’s trying to deploy to GRU, I think. Which is odd, as I don’t have it set in my regions anymore.
Regions [app]: sea

So yeah, not sure what’s going on :slight_smile:

1 Like

Have you tried simply restarting the app by chance?

Also, is this a V2 or V1 app?

It’s a V2, did not try to restart (only new deploys). I’m restarting it now to see if something changes. Thanks for your help btw.

edit: Restarting didn’t fix. Still happening.
I think I’ll add gru as a region and then drop it to see if it fixes.

Hmm I wonder if the machine is stuck in a bad state. Can you try to clone the machine to the same region, then once that has completed, delete the old machine.

Unfortunately didn’t work. Just made a new deploy to see if it fixed itself but its still releasing. But the ‘Monitoring’ tab doesn’t show GRU anymore, only SEA. I’m on V1256 (well, not really as flyctl releases show something else). Any other ideas? I’m thinking about creating a new app and migrating everything else there, but not sure though…

Can you share your fly toml?

Sure! Just changed it to use the new http_service format.

It was like this:

kill_signal = "SIGINT"
kill_timeout = 5
primary_region = "sea"
processes = []

[metrics]
port = 9090
path = "/metrics"

[env]
  PORT = "9090"

[deploy]
    release_command = "sh /root/migrate.sh"

[experimental]
  auto_rollback = false

[[services]]
  http_checks = []
  internal_port = 9090
  processes = ["app"]
  protocol = "tcp"
  script_checks = []
  [services.concurrency]
    hard_limit = 1000
    soft_limit = 800
    type = "connections"

  [[services.ports]]
    force_https = true
    handlers = ["http"]
    port = 80

  [[services.ports]]
    handlers = ["tls", "http"]
    port = 443

  [[services.tcp_checks]]
    grace_period = "1s"
    interval = "15s"
    restart_limit = 0
    timeout = "2s"

Now it looks like this:

kill_signal = "SIGINT"
kill_timeout = 5
primary_region = "sea"
processes = []

[metrics]
port = 9090
path = "/metrics"

[env]
  PORT = "9090"

[deploy]
    release_command = "sh /root/migrate.sh"

[experimental]
  auto_rollback = false


[http_service]
  internal_port = 9090
  force_https = true
  [http_service.concurrency]
    type = "connections"
    soft_limit = 800
    hard_limit = 1000

Fly has stopped the Releases on v1299. If it comes back it’ll come back now, since I just deployed that fly.toml format…

I had auto_rollback enabled earlier too. Disabled it a few hours ago, I think (when the problem started).

edit: Just noticed the UI mentioning to add checks. I’ll have a look at that config.
edit2: Oh it mentioned Setup checks because I removed them from my old fly.toml. Just fixed it.

Well, it seems to have stopped at v1299, monitoring screen is still talking about the failed deployment for v1299 though.

1 Like

Nothing sticks out to me that would cause this, I would bet that the app is stuck in a bad state and fly will continue to think it needs to be deployed until its corrected either manually by the team or by knocking it back into a good state via a series of commands unfortunately.

Might be worth just creating a new app, running the ENV script from the upgrade guide to copy over ENVs and then swap out DNS to the new apps IPs.

https://fly.io/docs/apps/migrate-to-v2/#copy-any-secrets-you-need-from-your-existing-app

1 Like

Thanks a lot!

well, the app is running successfully… but I think I’ll move everything into another one because that huge activity is really weird. But it did stop releasing though. I think it starts when I set env vars (haven’t set one in a while) but can’t be sure.

Thanks a lot for your help!

1 Like

OK, so that’s strange. Ended up recreating the application with the same fly.toml but only a different app name and now its happening again. Only deployed one time yesterday. Now on V1600… really strange.

Doesn’t seem to be impacting anything, but it’s a really strange behavior.

@brenol Could you please describe the steps you used to deploy the app?

For some reasons, your app has both Machine instances and a nomad config. Nomad tries to create allocations, fails, but this leads to constantly increasing release number.
This looks like a bug on our side and I’m trying to understand how this could have happened.

2 Likes

Hi @pavel!

Thanks for your answer.
Sure, I can explain how the last machine was created:

With the help and comments from @danwetherald I simply recreated the machine using the fly.toml mentioned above, using http_service.

Before deploying I also noticed I didn’t have apps-v2 default on enabled, so I enabled it:
fly orgs apps-v2 default-on <org-name>

And then I went to deploy the new app:

flyctl apps create <appname>
flyctl secrets set <all-the-secrets>

opened the fly.toml mentioned above and change the app_name to my new app name.

Went to the UI, and migrated my certificate (I’m not having production use right now).

Pushed to my repository, which then deployed the application, using this github workflow:

name: Fly Deploy
on:
  push:
    branches: [main]
env:
  FLY_API_TOKEN: ${{ secrets.FLY_API_TOKEN }}
jobs:
  deploy:
      name: Deploy apps
      runs-on: ubuntu-latest
      steps:
        - uses: actions/checkout@v3
        - uses: superfly/flyctl-actions/setup-flyctl@master
        - run: flyctl deploy --remote-only

After the deploy was complete, I went and updated my Cloudflare configuration with the new IPs.
I noticed I had no checks set-up (as I migrated from the [[services]] to the [http_service] block, so I added the http_check and tcp_check to my fly.toml.

Finally, I ran flyctl machines destroy <old_machine_id>.

I think thats all I did. Hope it helps to troubleshoot!

Thanks a lot!

1 Like

@brenol I can’t seem to reproduce with these steps. Just to double check, have you enabled autoscaling on the app (fly autoscale set min=<min> max=<max>)? It looks like enabling autoscaling on a Machine app leads to exactly this behavior.

Oh. Yes! I did enable autoscale on the machine. Sorry about that, forgot to mention it as I had to search on my shell history to identify all used commands and I completelly missed it.

So, running flyctl autoscale disable on my side should fix cause a fix, correct? I’ll have a look on how I should run autoscale on apps-v2 then. Thank you @pavel.

I think there is currently another bug that prevents it from being disabled. So a quicker way would be to re-create the app (you mentioned it’s not yet in production, right?).

I’ll have a look on how I should run autoscale on apps-v2 then.

For this you would need to pre-create additional machines and make sure that you have auto_stop_machines = true and auto_start_machines = true in fly.toml

1 Like

Gotcha. I’ll recreate it then!

I was thinking about doing it already due to Increasing Apps V2 availability - #9 by charsleysa, so no problem :). Thank you!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.