I’ve had several deploys fail over the last few days but the error seems to be somewhat intermittent.
flyctl deploy --remote-only --image registry.hub.docker.com/shieldsio/shields:next
and sometimes the deploy job will fail with
Failed due to unhealthy allocations.
If I inspect the instance with a failed health check using
flyctl vm status <instance-id>
the output will look something like
Recent Events TIMESTAMP TYPE MESSAGE 2022-05-03T12:49:20Z Received Task received by client 2022-05-03T12:49:20Z Task Setup Building Task Directory 2022-05-03T12:49:39Z Driver Failure rpc error: code = Unknown desc = could not set bigger stdout pipe: cannot allocate memory 2022-05-03T12:49:39Z Not Restarting Error was unrecoverable 2022-05-03T12:49:39Z Alloc Unhealthy Unhealthy because of failed task 2022-05-03T12:49:39Z Killing Sent interrupt. Waiting 5s before force killing 2022-05-03T12:49:40Z Killing Sent interrupt. Waiting 5s before force killing
and show that the cause of the failure was
rpc error: code = Unknown desc = could not set bigger stdout pipe: cannot allocate memory.
There are two patterns I have noticed here, but they could be red herrings:
- We have two apps in our organisation: staging and production. Staging runs one VM instance. Proudction runs lots of VM instances (the exact number varies but the minimum is 14). I’ve only ever seen this failure deploying to production, not staging. This makes me think it could be some kind of concurrency related issue but this may just be because the sample size is larger: there are many more instances that could possibly fail deploying to production.
- We usually kick off deploys using a GitHub
workflow_dispatchaction which uses
flyctland then runs
flyctl deploy. I’ve only ever seen this error happen when kicking off the deploy via GitHub actions. I’ve never seen it happen when running the deploy locally. I can’t see any obvious reason for this difference given we are using remote builders. Might be coincidence. Might not.
Is there any other information I can provide to help track down the cause of this?