Deployment fails for a specific app

After 66 successful deploys, the 67th one seems to fail:

2022-10-19T18:34:09Z runner[486c6bda] dfw [info]Starting instance
2022-10-19T18:34:10Z runner[486c6bda] dfw [info]Configuring virtual machine
2022-10-19T18:34:10Z runner[486c6bda] dfw [info]Pulling container image
2022-10-19T18:34:10Z runner[486c6bda] dfw [info]Unpacking image
2022-10-19T18:35:14Z runner[486c6bda] dfw [info]Pull failed, retrying (attempt #0)
2022-10-19T18:35:14Z runner[486c6bda] dfw [info]Unpacking image
2022-10-19T18:35:14Z runner[486c6bda] dfw [info]Pull failed, retrying (attempt #1)
2022-10-19T18:35:14Z runner[486c6bda] dfw [info]Unpacking image
2022-10-19T18:35:14Z runner[486c6bda] dfw [info]Pull failed, retrying (attempt #2)
2022-10-19T18:35:14Z runner[486c6bda] dfw [info]Pulling image failed

Other apps in DFW, deployed exactly the same way, were successful.

Moved the app to SEA and I don’t experience the same problem. Seems to be a problem at DFW?

Hi,

I’m facing the same error currently when deploying to LHR:

2022-10-19T19:23:08Z runner[baa813c2] lhr [info]Pull failed, retrying (attempt #0) 2022-10-19T19:23:08Z runner[baa813c2] lhr [info]Unpacking image 2022-10-19T19:23:08Z runner[baa813c2] lhr [info]Pull failed, retrying (attempt #1) 2022-10-19T19:23:09Z runner[baa813c2] lhr [info]Unpacking image 2022-10-19T19:23:09Z runner[baa813c2] lhr [info]Pull failed, retrying (attempt #2) 2022-10-19T19:23:09Z runner[baa813c2] lhr [info]Pulling image failed

It’s happening for multiple apps.

Thanks for looking into this,
Martin

Which app(s) are you seeing this for?

dq-home-staging and dq-catalog-staging

We had similar problems.
2 deploys failed, but the 3rd succeeded.
Now everything working OK.

When running fly vm status 5d6f2ba0 we got:

Instance
  ID            = 5d6f2ba0
  Process       = app
  Version       = 245
  Region        = lhr
  Desired       = stop
  Status        = failed
  Health Checks =
  Restarts      = 0
  Created       = 23m30s ago

Events
TIMESTAMP           	TYPE           	MESSAGE
2022-10-19T19:03:04Z	Received       	Task received by client
2022-10-19T19:03:30Z	Task Setup     	Building Task Directory
2022-10-19T19:03:38Z	Driver Failure 	rpc error: code = Unknown desc = unable to create microvm: error pulling image: unknown

Just in case it helps, we’re also seeing this for the past hour or so. Logs:

2022-10-19T20:04:12.830 runner[f543bc69] dfw [info] Starting instance

2022-10-19T20:04:13.210 runner[f543bc69] dfw [info] Configuring virtual machine

2022-10-19T20:04:13.213 runner[f543bc69] dfw [info] Pulling container image

2022-10-19T20:04:17.590 runner[f543bc69] dfw [info] Unpacking image

2022-10-19T20:04:17.734 runner[f543bc69] dfw [info] Pull failed, retrying (attempt #0)

2022-10-19T20:04:17.941 runner[f543bc69] dfw [info] Unpacking image

2022-10-19T20:04:18.094 runner[f543bc69] dfw [info] Pull failed, retrying (attempt #1)

2022-10-19T20:04:18.326 runner[f543bc69] dfw [info] Unpacking image

2022-10-19T20:04:18.472 runner[f543bc69] dfw [info] Pull failed, retrying (attempt #2)

2022-10-19T20:04:18.472 runner[f543bc69] dfw [info] Pulling image failed

And this is in DFW as well.

2 Likes

Thanks, we’re debugging this.

3 Likes

Probably related:
When I deployed just now, it failed, but ended up with 2 LHR, even though I have --max-per-region=1. I should have an iad instead of extra lhr.

ID      	PROCESS	VERSION	REGION	DESIRED	STATUS 	HEALTH CHECKS      	RESTARTS	CREATED
c68eb352	app    	249    	lhr   	run    	running	1 total, 1 passing 	0       	25s ago
558d50e2	app    	249    	lhr   	run    	running	1 total, 1 critical	0       	41s ago
b9ae125b	app    	249    	syd   	run    	running	1 total            	0       	41s ago

This was caused by a logical volume used to unpack image layers running out of space. 5 hosts were impacted, none in the same region, which explains why failures were intermittent. We fixed the issue and will be digging into why our monitoring didn’t catch this.

Here’s the status page: Fly.io Status - Image Pull Failures

1 Like

We’re also seeing this in LHR

Are you still having issues? Can you share your app name either here or to support@?

The last build passed

Okay good. Let us know if you have any more issues.

My deploys are still failing, both not able to stage the images, or problems pulling the images.

Hi gsong, are you still seeing issues? Let us know what region and app if so. Thanks!

No, not seeing the issue anymore. Thanks for following up.

This is happening again in SJC.

This happens on my 2 fly.io account with the basic go application.