dq-home-staging and dq-catalog-staging
We had similar problems.
2 deploys failed, but the 3rd succeeded.
Now everything working OK.
When running fly vm status 5d6f2ba0
we got:
Instance
ID = 5d6f2ba0
Process = app
Version = 245
Region = lhr
Desired = stop
Status = failed
Health Checks =
Restarts = 0
Created = 23m30s ago
Events
TIMESTAMP TYPE MESSAGE
2022-10-19T19:03:04Z Received Task received by client
2022-10-19T19:03:30Z Task Setup Building Task Directory
2022-10-19T19:03:38Z Driver Failure rpc error: code = Unknown desc = unable to create microvm: error pulling image: unknown
Just in case it helps, we’re also seeing this for the past hour or so. Logs:
2022-10-19T20:04:12.830 runner[f543bc69] dfw [info] Starting instance
2022-10-19T20:04:13.210 runner[f543bc69] dfw [info] Configuring virtual machine
2022-10-19T20:04:13.213 runner[f543bc69] dfw [info] Pulling container image
2022-10-19T20:04:17.590 runner[f543bc69] dfw [info] Unpacking image
2022-10-19T20:04:17.734 runner[f543bc69] dfw [info] Pull failed, retrying (attempt #0)
2022-10-19T20:04:17.941 runner[f543bc69] dfw [info] Unpacking image
2022-10-19T20:04:18.094 runner[f543bc69] dfw [info] Pull failed, retrying (attempt #1)
2022-10-19T20:04:18.326 runner[f543bc69] dfw [info] Unpacking image
2022-10-19T20:04:18.472 runner[f543bc69] dfw [info] Pull failed, retrying (attempt #2)
2022-10-19T20:04:18.472 runner[f543bc69] dfw [info] Pulling image failed
And this is in DFW as well.
Thanks, we’re debugging this.
Probably related:
When I deployed just now, it failed, but ended up with 2 LHR, even though I have --max-per-region=1
. I should have an iad instead of extra lhr.
ID PROCESS VERSION REGION DESIRED STATUS HEALTH CHECKS RESTARTS CREATED
c68eb352 app 249 lhr run running 1 total, 1 passing 0 25s ago
558d50e2 app 249 lhr run running 1 total, 1 critical 0 41s ago
b9ae125b app 249 syd run running 1 total 0 41s ago
This was caused by a logical volume used to unpack image layers running out of space. 5 hosts were impacted, none in the same region, which explains why failures were intermittent. We fixed the issue and will be digging into why our monitoring didn’t catch this.
Here’s the status page: Fly.io Status - Image Pull Failures
We’re also seeing this in LHR
Are you still having issues? Can you share your app name either here or to support@?
The last build passed
Okay good. Let us know if you have any more issues.
My deploys are still failing, both not able to stage the images, or problems pulling the images.
Hi gsong, are you still seeing issues? Let us know what region and app if so. Thanks!
No, not seeing the issue anymore. Thanks for following up.
This is happening again in SJC.
I was on 5 regions : yul, yyz, cdg, ams, fra.
Currently my deploys are also falling in sea region, looks like the registry is timing out.
It seems fly did have the same problem with their registry in the past.
You might have already seen this, but we’re currently investigating issues with our registry; these sound like they could be related.
We’ve put a statuspage up where you can subscribe to updates on the problem:
Also seeing issues trying to push images to the registry, not just pull them.
Lots of layers getting stuck pushing, then hanging around in a retry loop.
I’m using flyctl deploy --remote-only