Hi! One of my apps (https://reflame-resource.fly.dev/, and thankfully only that one) is getting completely stuck after deployment (at the point when monitoring starts, everything up to that point seems to finish normally) and never updates to the latest version.
I was able to recover the last time this happened a few days ago by scaling to 0 and then back to 3, but the issue came back, so I’m leaving it in this state for now to help with investigations.
Would be awesome if someone to help take a look at what’s going on. Thanks!
PS. It’d be nice if the fly github action could have some default timeout for monitoring, and fail the job if the timeout passes. Currently the only thing notifying me of the issue is a subsequent deploy action canceling the previous. In the mean time the action could have been sitting there for potentially hours eating up a ton of build minutes.
Regarding the github actions, you can use the timeout-minutes config on a job to ensure it’s cancelled if it takes too long so it doesn’t eat up hours of your build minutes.
Yep, that’s what I ended up doing, but I mentioned this because it sounds like a good idea for fly to have some default timeout for monitoring deploys. If an app doesn’t get deployed in over 10 minutes (or possibly even 5?), something’s probably gone horribly wrong, like it did here.
This looks like it’s doing a canary deploy. But you have 3 regions set, and a max per region of 1. Which means it can’t boot the new instance (because all three regions are in use).
This is a horrible sharp edge in our system and we’re working hard to replace that plumbing. If it gets in this state again, you can run fly vm stop <id> on any existing VM and it’ll unstick the deploy.
Your GitHub action should probably include deploy --strategy rolling if it doesn’t already. We try to prevent canary deploys for apps with this config but not all instances get caught.
Ah interesting, I do remember setting it at some point, but for some reason my fly scale show looked like this so I didn’t think to dig further:
> fly scale show
VM Resources for reflame-resource
VM Size: shared-cpu-1x
VM Memory: 256 MB
Count: 3
Max Per Region: Not set
Could that be a bug in the CLI?
I just ran fly scale count 3 --max-per-region -1 and reran the build. That seemed to update the app to the latest version.
But when I ran a new deploy with fly deploy strategy --bluegreen, it started hanging again in the exact same way. Did the --max-per-region -1 not apply correctly?
Hey Lewis, the reason the deployment is hanging is because you’re using both bluegreen and --max-per-region at the same time. Bluegreen is trying to launch a bunch of vms before tearing down the old ones but the max-per-region limits how many vms can be in one region.
To use --max-per-region -1 you need to have --strategy rolling, can you give that ago ?
Hi @rahmatjunaid, I understand --max-per-region is not compatible with --strategy bluegreen. However, what I’m trying to do is turn off--max-per-region so I can use --strategy bluegreen again.
My understanding was that --max-per-region -1 should accomplish that, looking at the implementation where it defaults to -1.
If that’s not correct, what is the correct way to disable --max-per-region so I can start using --strategy bluegreen again?
FWIW, fly scale show shows Max Per Region as Not set:
> fly scale show
VM Resources for reflame-resource
VM Size: shared-cpu-1x
VM Memory: 256 MB
Count: 3
Max Per Region: Not set