Deployment hanging "pending automatic promotion" for several minutes

Doing a few deploys of an app, and the most recent is now hanging for several minutes. The console shows nothing of interest (v32 is being deployed); the dashboard says “Deployment is running pending automatic promotion.”

I’ve looked over the troubleshooting page, but find no obvious issue. This also doesn’t feel like user error; fly deploy should “just work”, imo, across successive changes to app code. Possible outage or general issue…?

(For debugging: app = paasthru, tag = deployment-1653407630)

Just tried another fresh deploy, reproducing same issue.

Over 2 hours in and still can’t deploy.

While we’re just trying to get a small bugfix out the door, I’m very nervous about what this would be like for us in a more serious oncall situation, where we depend on fast and reliable deploys to be able to react to & resolve issues quickly.

At this point, looking for anything that could un-wedge us; do any community members have suggestions/tips/voodoo that’s worked before?

OK, stabbing at scale & count seems to have done… something.

Before:

➜  ✗ fly scale show
VM Resources for paasthru
        VM Size: shared-cpu-1x
      VM Memory: 256 MB
          Count: 2
 Max Per Region: Not set

Scaling up:

➜ ✗ fly scale count --max-per-region=10 4
Count changed to 4

I now have 4 versions of the app @ v35. Scaling back down to desired 2.

Did we do something wrong here?

Hey, Mikey! Does the deployment eventually error out or just hang endlessly?

Hi Zee! Thanks for the response.

The flyctl command hangs endlessly, at this stage:

--> release v33 created

--> You can detach the terminal anytime without stopping the deployment
==> Monitoring deployment

The dashboard would show a pending deploy endlessly, too. Here’s a screenshot from a couple hours ago - this was present throughout:

Per my reply just above this, scaling up seemed to uncork things; v35 got deployed. Scaling down to count=2 also worked (now at v36).

But the dashboard still seems a little confused: a fresh reload of the /:app/monitoring page shows this:

What region(s) are you deploying in?

Vanilla fly deploy against these regions:

$ fly regions list
Region Pool:
iad
lax
Backup Region:

(I think I removed lhr earlier, prior to when problems started.)

Can you try removing iad, we are experiencing some issues in iad that could be causing things to hang.

I can - although - since things are “fixed” now (per recent updates above), would you prefer I try re-deploying with iad in place first, to see if I can reproduce the original issue?

Yeah, let’s see if we can get the issue again first, then try removing if we do. That way we can narrow down a point of interest.

OK! Let’s make it fun, I ignorantly will wager $1 of fly credit that next deploy works; and fiddling with scale counts will eventually re-trigger it. :smile:

A re-deploy worked without issue:

$ fly deploy
==> Verifying app config
--> Verified app config
==> Building image
Remote builder fly-builder-late-waterfall-3277 ready
[...]
image: registry.fly.io/paasthru:deployment-1653418129
image size: 15 MB
==> Creating release
--> release v37 created

--> You can detach the terminal anytime without stopping the deployment
==> Monitoring deployment

 2 desired, 2 placed, 2 healthy, 0 unhealthy
--> v37 deployed successfully

I’ve tinkered with both count and --max-per-region to approximate some of the changes we made (and now forget) earlier in the day. Could not reproduce the issue. I guess I lose my bet :slight_smile:

I’ll leave things as they are now so you can investigate. I’m glad we’re uncorked, but inability to deploy would have been a lot more serious in a “real” oncall situation - so hoping you can rootcause. Just holler if I can help any further!

thanks so much for trying to recreate that!