New VMs stuck in "Pulling container image"

7d05d288 fra [info] Pulling container image

I just had created a new volume (but haven’t yet attached it. Could this be related? I can’t remove the volume as well, it fails with Could not find node with id 'nur_data'

This is bad for us as our server currently has no VMs running. I wanted to change the regions to a specific set, thus had scaled to 0 before this. App name: “there-nur”

Update : It booted up now after ~10 minutes. Curious why this happened and how can I prevent this?

We’re looking into it. It was probably a network issue. Since your app has a volume it can only run where the volume exists, and if that host had intermittent issues it would wait until it worked rather than reschedule on a different host.

Volume ids look like vol_jibberish, nur_data is the volume name which isn’t unique within an app. Should work with the id.

1 Like

Thanks, I could delete the volume. I’m trying to reduce the Dockerfile size (currently 1 GB) in the hope of reducing the wait time, as sadly it got stuck again in FRA.

The app is now in a frozen state where no matter what I do (suspend, resume, scale, change regions pool, deploy) logs don’t change and it’s stuck in [info] Setting up volume.

I just created the volume again to see if it helps at all.

Update: Seems to have worked and it’s now booting an instance finally. (I guess I might have deleted the volume just a bit before my new deploy was up that removed the [mount], so the new deploy didn’t make it through and it was still trying to boot the old version with [mount])
Monitoring…

This is really frustrating, I’m sorry. A 1gb image shouldn’t be an issue. Something is taking 10 minutes to pull the image but we’re not sure why yet.

1 Like

Thanks for looking into it. I understand.
To test another region, I created a new volume in AMS and scaled up to 2, but it didn’t take any actions. Maybe because volumes take a while to be created?

It could take a minute or two for a new volume to be provisioned. The deployment will wait when that happens.

It’s been ~7 minutes, and no logs yet.

Volumes list:

vol_okgj545kxq4y2wzp nur_data 10GB ams                8 minutes ago
vol_8zmjnv8lk54ywgx5 nur_data 10GB fra    b14dfe4e    14 minutes ago

Do you think there’s a way to bring an instance up in any region just to bring up the server?
Update: After 13 minutes, AMS finally started booting.
Update: AMS is up and running. FRA is still “Pulling container image” after close to 30 min!! Hope the root cause gets fixed soon. I liked the fast booting! Going to bed now :slight_smile:

It seems there was a problem provisioning volumes and the 10m was a timeout. We’ll keep digging in and see why our monitoring didn’t catch it.

Same problem for me today. I use the bluegreen strategy and the new VM cannot start util it failure.

Update:
It takes more than 2 minutes to pull image the bluegreen strategy fail.


using FLY_REGISTRY_HOST=registry-iad.fly.io fly deploy works for me