New VMs stuck in "Pulling container image"

mo.rajbi · April 26, 2021, 8:23pm

7d05d288 fra [info] Pulling container image

I just had created a new volume (but haven’t yet attached it. Could this be related? I can’t remove the volume as well, it fails with Could not find node with id 'nur_data'

This is bad for us as our server currently has no VMs running. I wanted to change the regions to a specific set, thus had scaled to 0 before this. App name: “there-nur”

mo.rajbi · April 26, 2021, 8:29pm

Update : It booted up now after ~10 minutes. Curious why this happened and how can I prevent this?

michael · April 26, 2021, 8:37pm

We’re looking into it. It was probably a network issue. Since your app has a volume it can only run where the volume exists, and if that host had intermittent issues it would wait until it worked rather than reschedule on a different host.

Volume ids look like vol_jibberish, nur_data is the volume name which isn’t unique within an app. Should work with the id.

mo.rajbi · April 26, 2021, 8:53pm

Thanks, I could delete the volume. I’m trying to reduce the Dockerfile size (currently 1 GB) in the hope of reducing the wait time, as sadly it got stuck again in FRA.

mo.rajbi · April 26, 2021, 9:15pm

The app is now in a frozen state where no matter what I do (suspend, resume, scale, change regions pool, deploy) logs don’t change and it’s stuck in [info] Setting up volume.

I just created the volume again to see if it helps at all.

Update: Seems to have worked and it’s now booting an instance finally. (I guess I might have deleted the volume just a bit before my new deploy was up that removed the [mount], so the new deploy didn’t make it through and it was still trying to boot the old version with [mount])
Monitoring…

michael · April 26, 2021, 9:23pm

This is really frustrating, I’m sorry. A 1gb image shouldn’t be an issue. Something is taking 10 minutes to pull the image but we’re not sure why yet.

mo.rajbi · April 26, 2021, 9:24pm

Thanks for looking into it. I understand.
To test another region, I created a new volume in AMS and scaled up to 2, but it didn’t take any actions. Maybe because volumes take a while to be created?

michael · April 26, 2021, 9:28pm

It could take a minute or two for a new volume to be provisioned. The deployment will wait when that happens.

mo.rajbi · April 26, 2021, 9:29pm

It’s been ~7 minutes, and no logs yet.

Volumes list:

vol_okgj545kxq4y2wzp nur_data 10GB ams                8 minutes ago
vol_8zmjnv8lk54ywgx5 nur_data 10GB fra    b14dfe4e    14 minutes ago

Do you think there’s a way to bring an instance up in any region just to bring up the server?
Update: After 13 minutes, AMS finally started booting.
Update: AMS is up and running. FRA is still “Pulling container image” after close to 30 min!! Hope the root cause gets fixed soon. I liked the fast booting! Going to bed now

michael · April 26, 2021, 11:25pm

It seems there was a problem provisioning volumes and the 10m was a timeout. We’ll keep digging in and see why our monitoring didn’t catch it.

noy · September 7, 2023, 12:37pm

Same problem for me today. I use the bluegreen strategy and the new VM cannot start util it failure.

Update:
It takes more than 2 minutes to pull image the bluegreen strategy fail.

noy · September 15, 2023, 1:40pm

using FLY_REGISTRY_HOST=registry-iad.fly.io fly deploy works for me

Topic		Replies	Views
Outage? New VMs stuck on "Pulling container image" registry	7	59	June 4, 2025
[FRA] App hanging on "is being Deployed", then not starting at all Questions / Help	3	338	January 24, 2023
Can't deploy new version Build debugging	10	415	September 13, 2022
Deployment stuck at "Pulling container image"	4	856	August 31, 2023
Deploys stuck in `pending` Questions / Help	4	490	February 3, 2023

New VMs stuck in "Pulling container image"

Related topics