We had similar problems.
2 deploys failed, but the 3rd succeeded.
Now everything working OK.
When running fly vm status 5d6f2ba0 we got:
Instance
ID = 5d6f2ba0
Process = app
Version = 245
Region = lhr
Desired = stop
Status = failed
Health Checks =
Restarts = 0
Created = 23m30s ago
Events
TIMESTAMP TYPE MESSAGE
2022-10-19T19:03:04Z Received Task received by client
2022-10-19T19:03:30Z Task Setup Building Task Directory
2022-10-19T19:03:38Z Driver Failure rpc error: code = Unknown desc = unable to create microvm: error pulling image: unknown
Probably related:
When I deployed just now, it failed, but ended up with 2 LHR, even though I have --max-per-region=1. I should have an iad instead of extra lhr.
ID PROCESS VERSION REGION DESIRED STATUS HEALTH CHECKS RESTARTS CREATED
c68eb352 app 249 lhr run running 1 total, 1 passing 0 25s ago
558d50e2 app 249 lhr run running 1 total, 1 critical 0 41s ago
b9ae125b app 249 syd run running 1 total 0 41s ago
This was caused by a logical volume used to unpack image layers running out of space. 5 hosts were impacted, none in the same region, which explains why failures were intermittent. We fixed the issue and will be digging into why our monitoring didn’t catch this.