Deployment fails for a specific app

dq-home-staging and dq-catalog-staging

We had similar problems.
2 deploys failed, but the 3rd succeeded.
Now everything working OK.

When running fly vm status 5d6f2ba0 we got:

Instance
  ID            = 5d6f2ba0
  Process       = app
  Version       = 245
  Region        = lhr
  Desired       = stop
  Status        = failed
  Health Checks =
  Restarts      = 0
  Created       = 23m30s ago

Events
TIMESTAMP           	TYPE           	MESSAGE
2022-10-19T19:03:04Z	Received       	Task received by client
2022-10-19T19:03:30Z	Task Setup     	Building Task Directory
2022-10-19T19:03:38Z	Driver Failure 	rpc error: code = Unknown desc = unable to create microvm: error pulling image: unknown

Just in case it helps, we’re also seeing this for the past hour or so. Logs:

2022-10-19T20:04:12.830 runner[f543bc69] dfw [info] Starting instance

2022-10-19T20:04:13.210 runner[f543bc69] dfw [info] Configuring virtual machine

2022-10-19T20:04:13.213 runner[f543bc69] dfw [info] Pulling container image

2022-10-19T20:04:17.590 runner[f543bc69] dfw [info] Unpacking image

2022-10-19T20:04:17.734 runner[f543bc69] dfw [info] Pull failed, retrying (attempt #0)

2022-10-19T20:04:17.941 runner[f543bc69] dfw [info] Unpacking image

2022-10-19T20:04:18.094 runner[f543bc69] dfw [info] Pull failed, retrying (attempt #1)

2022-10-19T20:04:18.326 runner[f543bc69] dfw [info] Unpacking image

2022-10-19T20:04:18.472 runner[f543bc69] dfw [info] Pull failed, retrying (attempt #2)

2022-10-19T20:04:18.472 runner[f543bc69] dfw [info] Pulling image failed

And this is in DFW as well.

3 Likes

Thanks, we’re debugging this.

3 Likes

Probably related:
When I deployed just now, it failed, but ended up with 2 LHR, even though I have --max-per-region=1. I should have an iad instead of extra lhr.

ID      	PROCESS	VERSION	REGION	DESIRED	STATUS 	HEALTH CHECKS      	RESTARTS	CREATED
c68eb352	app    	249    	lhr   	run    	running	1 total, 1 passing 	0       	25s ago
558d50e2	app    	249    	lhr   	run    	running	1 total, 1 critical	0       	41s ago
b9ae125b	app    	249    	syd   	run    	running	1 total            	0       	41s ago

This was caused by a logical volume used to unpack image layers running out of space. 5 hosts were impacted, none in the same region, which explains why failures were intermittent. We fixed the issue and will be digging into why our monitoring didn’t catch this.

Here’s the status page: Fly.io Status - Image Pull Failures

1 Like

We’re also seeing this in LHR

Are you still having issues? Can you share your app name either here or to support@?

The last build passed

Okay good. Let us know if you have any more issues.

My deploys are still failing, both not able to stage the images, or problems pulling the images.

Hi gsong, are you still seeing issues? Let us know what region and app if so. Thanks!

No, not seeing the issue anymore. Thanks for following up.

This is happening again in SJC.

This happens on my 2 fly.io account with the basic go application.

I was on 5 regions : yul, yyz, cdg, ams, fra.

Currently my deploys are also falling in sea region, looks like the registry is timing out.

It seems fly did have the same problem with their registry in the past.

You might have already seen this, but we’re currently investigating issues with our registry; these sound like they could be related.

We’ve put a statuspage up where you can subscribe to updates on the problem:

2 Likes

Also seeing issues trying to push images to the registry, not just pull them.

Lots of layers getting stuck pushing, then hanging around in a retry loop.

I’m using flyctl deploy --remote-only