App stuck on `pending`

We just did a redeploy and our app (only web process) is on pending for the last 5 minutes. Is everything going okay? We only deleted a secret, so I find it hard to believe that we broke it.

ebc3741c web 156 fra run pending 0 4m29s ago

App is called staxcloud-prod.

“Failed due to unhealthy allocations - not rolling back to stable job version 156 as current job has same specification”

The VM is still on pending though and did not revert to a previous version.

We pushed a dummy change and it worked. Still confused as to why it went down :frowning:

It looks like you did a rolling deploy, but your app only had one VM running. Rolling deploys take down existing VMs, then bring up new ones. Sometimes new VMs take a while to schedule so this can cause downtime.

You should run fly scale count 2 at a minimum for apps you care about. fly deploy --strategy canary is also a more reliable deployment process, provided you’re not using volumes or --max-per-region.

1 Like

Thanks for that explanation @kurt

I’m really confused about deploys today :thinking: I am running a deploy now and I see:

--> This release will not be available until the release command succeeds.
         Starting instance
Running release task (pending)... 🌏

Ok that doesn’t that look that weird, but my new VMs are already started. They only start after running the release task, right?

In the logs I see the migrations (my release command) have already ran successfully, yet it’s still stuck at “Running release task (pending)…”.

Even 5 mins later the web and scheduler are stuck on pending :thinking: and it says it’s still stuck on the release command

Wow interesting. Now I am getting new VMs and the other “new” VMs are getting shut down:

Are these “old” deploys that are only now getting through the queue?

Now fly deploy aborted with:

Some VMs (of the latest version) are now in pending but scheduler is on running :thinking:

2 mins later desired switched to stop and they are getting removed again…

Idk if I am doing something wrong or if something weird is going on at Fly but I have do know that I have no idea how to fix this :joy:

So even with those VMs that have desired=stop, I am now seeing:

2022-12-22T17:36:42Z runner[bed6314c] fra [info]Configuring virtual machine
2022-12-22T17:36:42Z runner[bed6314c] fra [info]Pulling container image
2022-12-22T17:36:53Z runner[bed6314c] fra [info]Unpacking image
2022-12-22T17:36:55Z runner[b168964b] fra [info]Configuring virtual machine
2022-12-22T17:36:55Z runner[b168964b] fra [info]Pulling container image
2022-12-22T17:36:55Z runner[b168964b] fra [info]Unpacking image

That sounds like it’s preparing for a new VM :thinking:

I did make some changes to my Dockerfile, fly.toml etcetera so I don’t rule out I broke something but it’s kinda hard to debug if that’s the case, and the behavior that I’m seeing in the logs & fly status is hard toe explain for me

It behaves just as weirdly on the main branch, so I find it hard to believe that I broke it with my changes :thinking:

(I’m working on staxcloud-staging btw, would be great if someone could have a look!)

I’m also hit with that error. Deployed a few times today and the last few deploys are getting stuck on Running release task (pending)... 🌏 and then error out with the above message. No idea why.

Hi folks, we had a host in our fra region that was overloaded and taking several minutes to sometimes much longer to start new instances. We’ve stopped scheduling new allocs on that host and will make sure it’s healthy before allowing new allocs back on it. It looks like the folks on this thread were likely impacted by that issue.

If you continue to see this issue, one mitigation is to try deploying to another region at least temporarily.

1 Like