Deployment hangs forever

lewis · April 5, 2022, 11:56pm

Hi! One of my apps (https://reflame-resource.fly.dev/, and thankfully only that one) is getting completely stuck after deployment (at the point when monitoring starts, everything up to that point seems to finish normally) and never updates to the latest version.

Dashboard looks something like this:

I was able to recover the last time this happened a few days ago by scaling to 0 and then back to 3, but the issue came back, so I’m leaving it in this state for now to help with investigations.

Would be awesome if someone to help take a look at what’s going on. Thanks!

PS. It’d be nice if the fly github action could have some default timeout for monitoring, and fail the job if the timeout passes. Currently the only thing notifying me of the issue is a subsequent deploy action canceling the previous. In the mean time the action could have been sitting there for potentially hours eating up a ton of build minutes.

charsleysa · April 6, 2022, 12:09am

Hi @lewis

Regarding the github actions, you can use the timeout-minutes config on a job to ensure it’s cancelled if it takes too long so it doesn’t eat up hours of your build minutes.

rahmatjunaid · April 6, 2022, 12:54am

Hey @lewis can you run fly logs and paste the result

lewis · April 6, 2022, 1:24am

@rahmatjunaid So I just opened fly logs and triggered another deploy, and no new logs showed up at all.

lewis · April 6, 2022, 1:27am

Yep, that’s what I ended up doing, but I mentioned this because it sounds like a good idea for fly to have some default timeout for monitoring deploys. If an app doesn’t get deployed in over 10 minutes (or possibly even 5?), something’s probably gone horribly wrong, like it did here.

chloerei · April 6, 2022, 3:39am

You can add this flag after deploy command:

--detach                Return immediately instead of monitoring deployment progress

kurt · April 6, 2022, 3:57am

This looks like it’s doing a canary deploy. But you have 3 regions set, and a max per region of 1. Which means it can’t boot the new instance (because all three regions are in use).

This is a horrible sharp edge in our system and we’re working hard to replace that plumbing. If it gets in this state again, you can run fly vm stop <id> on any existing VM and it’ll unstick the deploy.

Your GitHub action should probably include deploy --strategy rolling if it doesn’t already. We try to prevent canary deploys for apps with this config but not all instances get caught.

lewis · April 6, 2022, 6:39am

Ah interesting, I do remember setting it at some point, but for some reason my fly scale show looked like this so I didn’t think to dig further:

> fly scale show
VM Resources for reflame-resource
        VM Size: shared-cpu-1x
      VM Memory: 256 MB
          Count: 3
 Max Per Region: Not set

Could that be a bug in the CLI?

I just ran fly scale count 3 --max-per-region -1 and reran the build. That seemed to update the app to the latest version.

But when I ran a new deploy with fly deploy strategy --bluegreen, it started hanging again in the exact same way. Did the --max-per-region -1 not apply correctly?

lewis · April 6, 2022, 6:42am

Yes I’m aware of that flag, but I do still want to monitor to wait for successful deploys to complete and get alerted if there’s a timeout.

lewis · April 13, 2022, 6:56am

@kurt I still need some help with this, as my deploys are still hanging until I manually scale to 0 and back.

As per my previous reply, I’ve set --max-per-region -1, that means --strategy bluegreen should work, right?

Could it be not applying correctly?

rahmatjunaid · April 13, 2022, 9:05pm

Hey Lewis, the reason the deployment is hanging is because you’re using both bluegreen and --max-per-region at the same time. Bluegreen is trying to launch a bunch of vms before tearing down the old ones but the max-per-region limits how many vms can be in one region.

To use --max-per-region -1 you need to have --strategy rolling, can you give that ago ?

lewis · April 13, 2022, 9:27pm

Hi @rahmatjunaid, I understand --max-per-region is not compatible with --strategy bluegreen. However, what I’m trying to do is turn off --max-per-region so I can use --strategy bluegreen again.

My understanding was that --max-per-region -1 should accomplish that, looking at the implementation where it defaults to -1.

If that’s not correct, what is the correct way to disable --max-per-region so I can start using --strategy bluegreen again?

FWIW, fly scale show shows Max Per Region as Not set:

> fly scale show
VM Resources for reflame-resource
        VM Size: shared-cpu-1x
      VM Memory: 256 MB
          Count: 3
 Max Per Region: Not set

But I think there’s a bug there so not sure if I can trust that: MaxPerRegion is always "Not Set" · Issue #596 · superfly/flyctl · GitHub

Topic		Replies	Views
deployment is hanging and not completing successfully	13	956	March 31, 2023
Deploying to Fly from Github Action hangs Questions / Help	13	1004	November 11, 2021
One of my apps stuck in pending Questions / Help	9	1279	July 17, 2023
Deployment hanging "pending automatic promotion" for several minutes	14	465	May 24, 2022
Fly deploy failing executing release command Questions / Help	3	557	August 29, 2022

Deployment hangs forever

Related topics