Fly Releasing non-stop

brenol · April 19, 2023, 2:01am

Hi! I’m running a application on fly and I’m making a few changes, however in my Activity Page Fly.io is releasing about every minute, I’m now on v959 (I think I was on v50 or something)?. Seems like a issue on your side, I’m just reporting as it seems it has stopped.

Thanks!

danwetherald · April 19, 2023, 3:23am

Every time a scaling event happens the version number will increment. This includes autoscaling and region switches.

brenol · April 19, 2023, 3:27am

Hi dan,

Yep, I’m aware. However for some reason it’s changing by itself:

But using flyctl releases show the correct last version:
v74 complete Release [emailredacted] 21m24s ago

So yeah, not sure what’s going on. But it seems somehow fly broke.

Here’s my Monitoring screen:

However, all health checks are passing, and I have only 1 application configured.
I was using GRU last week but changed to SEA, for some reason it’s trying to deploy to GRU, I think. Which is odd, as I don’t have it set in my regions anymore.
Regions [app]: sea

So yeah, not sure what’s going on

danwetherald · April 19, 2023, 3:30am

Have you tried simply restarting the app by chance?

Also, is this a V2 or V1 app?

brenol · April 19, 2023, 3:33am

It’s a V2, did not try to restart (only new deploys). I’m restarting it now to see if something changes. Thanks for your help btw.

edit: Restarting didn’t fix. Still happening.
I think I’ll add gru as a region and then drop it to see if it fixes.

danwetherald · April 19, 2023, 3:36am

Hmm I wonder if the machine is stuck in a bad state. Can you try to clone the machine to the same region, then once that has completed, delete the old machine.

brenol · April 19, 2023, 3:55am

Unfortunately didn’t work. Just made a new deploy to see if it fixed itself but its still releasing. But the ‘Monitoring’ tab doesn’t show GRU anymore, only SEA. I’m on V1256 (well, not really as flyctl releases show something else). Any other ideas? I’m thinking about creating a new app and migrating everything else there, but not sure though…

danwetherald · April 19, 2023, 4:09am

Can you share your fly toml?

brenol · April 19, 2023, 4:15am

Sure! Just changed it to use the new http_service format.

It was like this:

kill_signal = "SIGINT"
kill_timeout = 5
primary_region = "sea"
processes = []

[metrics]
port = 9090
path = "/metrics"

[env]
  PORT = "9090"

[deploy]
    release_command = "sh /root/migrate.sh"

[experimental]
  auto_rollback = false

[[services]]
  http_checks = []
  internal_port = 9090
  processes = ["app"]
  protocol = "tcp"
  script_checks = []
  [services.concurrency]
    hard_limit = 1000
    soft_limit = 800
    type = "connections"

  [[services.ports]]
    force_https = true
    handlers = ["http"]
    port = 80

  [[services.ports]]
    handlers = ["tls", "http"]
    port = 443

  [[services.tcp_checks]]
    grace_period = "1s"
    interval = "15s"
    restart_limit = 0
    timeout = "2s"

Now it looks like this:

kill_signal = "SIGINT"
kill_timeout = 5
primary_region = "sea"
processes = []

[metrics]
port = 9090
path = "/metrics"

[env]
  PORT = "9090"

[deploy]
    release_command = "sh /root/migrate.sh"

[experimental]
  auto_rollback = false


[http_service]
  internal_port = 9090
  force_https = true
  [http_service.concurrency]
    type = "connections"
    soft_limit = 800
    hard_limit = 1000

Fly has stopped the Releases on v1299. If it comes back it’ll come back now, since I just deployed that fly.toml format…

I had auto_rollback enabled earlier too. Disabled it a few hours ago, I think (when the problem started).

edit: Just noticed the UI mentioning to add checks. I’ll have a look at that config.
edit2: Oh it mentioned Setup checks because I removed them from my old fly.toml. Just fixed it.

Well, it seems to have stopped at v1299, monitoring screen is still talking about the failed deployment for v1299 though.

danwetherald · April 19, 2023, 4:21am

Nothing sticks out to me that would cause this, I would bet that the app is stuck in a bad state and fly will continue to think it needs to be deployed until its corrected either manually by the team or by knocking it back into a good state via a series of commands unfortunately.

Might be worth just creating a new app, running the ENV script from the upgrade guide to copy over ENVs and then swap out DNS to the new apps IPs.

https://fly.io/docs/apps/migrate-to-v2/#copy-any-secrets-you-need-from-your-existing-app

brenol · April 19, 2023, 4:25am

Thanks a lot!

well, the app is running successfully… but I think I’ll move everything into another one because that huge activity is really weird. But it did stop releasing though. I think it starts when I set env vars (haven’t set one in a while) but can’t be sure.

Thanks a lot for your help!

brenol · April 19, 2023, 12:26pm

OK, so that’s strange. Ended up recreating the application with the same fly.toml but only a different app name and now its happening again. Only deployed one time yesterday. Now on V1600… really strange.

Doesn’t seem to be impacting anything, but it’s a really strange behavior.

pavel · April 19, 2023, 3:34pm

@brenol Could you please describe the steps you used to deploy the app?

For some reasons, your app has both Machine instances and a nomad config. Nomad tries to create allocations, fails, but this leads to constantly increasing release number.
This looks like a bug on our side and I’m trying to understand how this could have happened.

brenol · April 19, 2023, 4:41pm

Hi @pavel!

Thanks for your answer.
Sure, I can explain how the last machine was created:

With the help and comments from @danwetherald I simply recreated the machine using the fly.toml mentioned above, using http_service.

Before deploying I also noticed I didn’t have apps-v2 default on enabled, so I enabled it:
fly orgs apps-v2 default-on <org-name>

And then I went to deploy the new app:

flyctl apps create <appname>
flyctl secrets set <all-the-secrets>

opened the fly.toml mentioned above and change the app_name to my new app name.

Went to the UI, and migrated my certificate (I’m not having production use right now).

Pushed to my repository, which then deployed the application, using this github workflow:

name: Fly Deploy
on:
  push:
    branches: [main]
env:
  FLY_API_TOKEN: ${{ secrets.FLY_API_TOKEN }}
jobs:
  deploy:
      name: Deploy apps
      runs-on: ubuntu-latest
      steps:
        - uses: actions/checkout@v3
        - uses: superfly/flyctl-actions/setup-flyctl@master
        - run: flyctl deploy --remote-only

After the deploy was complete, I went and updated my Cloudflare configuration with the new IPs.
I noticed I had no checks set-up (as I migrated from the [[services]] to the [http_service] block, so I added the http_check and tcp_check to my fly.toml.

Finally, I ran flyctl machines destroy <old_machine_id>.

I think thats all I did. Hope it helps to troubleshoot!

Thanks a lot!

pavel · April 20, 2023, 11:26am

@brenol I can’t seem to reproduce with these steps. Just to double check, have you enabled autoscaling on the app (fly autoscale set min=<min> max=<max>)? It looks like enabling autoscaling on a Machine app leads to exactly this behavior.

brenol · April 20, 2023, 12:14pm

Oh. Yes! I did enable autoscale on the machine. Sorry about that, forgot to mention it as I had to search on my shell history to identify all used commands and I completelly missed it.

So, running flyctl autoscale disable on my side should fix cause a fix, correct? I’ll have a look on how I should run autoscale on apps-v2 then. Thank you @pavel.

pavel · April 20, 2023, 12:17pm

I think there is currently another bug that prevents it from being disabled. So a quicker way would be to re-create the app (you mentioned it’s not yet in production, right?).

I’ll have a look on how I should run autoscale on apps-v2 then.

For this you would need to pre-create additional machines and make sure that you have auto_stop_machines = true and auto_start_machines = true in fly.toml

brenol · April 20, 2023, 12:20pm

Gotcha. I’ll recreate it then!

I was thinking about doing it already due to Increasing Apps V2 availability - #9 by charsleysa, so no problem :). Thank you!

system · April 27, 2023, 12:21pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Something went wrong? Questions / Help	42	1425	September 22, 2022
Can't deploy. Can't scale. "No deployment available to monitor"	10	348	October 28, 2022
Fly web app keeps redeploying troubleshooting	6	527	August 10, 2022
Stuck in deploying	5	487	August 27, 2021
Cannot deploy, getting 504s from Fly API	16	989	December 13, 2022

Fly Releasing non-stop

Related topics