rpc error: code = Unknown desc = could not set bigger stdout pipe: cannot allocate memory

chris48s · May 11, 2022, 6:29pm

I’ve had several deploys fail over the last few days but the error seems to be somewhat intermittent.

I’m running

flyctl deploy --remote-only --image registry.hub.docker.com/shieldsio/shields:next

and sometimes the deploy job will fail with Failed due to unhealthy allocations.

If I inspect the instance with a failed health check using

flyctl vm status <instance-id>

the output will look something like

Recent Events
TIMESTAMP            TYPE            MESSAGE                                                                                   
2022-05-03T12:49:20Z Received        Task received by client                                                                   
2022-05-03T12:49:20Z Task Setup      Building Task Directory                                                                   
2022-05-03T12:49:39Z Driver Failure  rpc error: code = Unknown desc = could not set bigger stdout pipe: cannot allocate memory 
2022-05-03T12:49:39Z Not Restarting  Error was unrecoverable                                                                   
2022-05-03T12:49:39Z Alloc Unhealthy Unhealthy because of failed task                                                          
2022-05-03T12:49:39Z Killing         Sent interrupt. Waiting 5s before force killing                                           
2022-05-03T12:49:40Z Killing         Sent interrupt. Waiting 5s before force killing

and show that the cause of the failure was rpc error: code = Unknown desc = could not set bigger stdout pipe: cannot allocate memory.

There are two patterns I have noticed here, but they could be red herrings:

We have two apps in our organisation: staging and production. Staging runs one VM instance. Proudction runs lots of VM instances (the exact number varies but the minimum is 14). I’ve only ever seen this failure deploying to production, not staging. This makes me think it could be some kind of concurrency related issue but this may just be because the sample size is larger: there are many more instances that could possibly fail deploying to production.
We usually kick off deploys using a GitHub workflow_dispatch action which uses superfly/flyctl-actions/setup-flyctl@master to install flyctl and then runs flyctl deploy. I’ve only ever seen this error happen when kicking off the deploy via GitHub actions. I’ve never seen it happen when running the deploy locally. I can’t see any obvious reason for this difference given we are using remote builders. Might be coincidence. Might not.

Is there any other information I can provide to help track down the cause of this?
Thanks.

kurt · May 11, 2022, 7:50pm

Does this seem to be hitting one region more than the other? It’s quite possible this is something cropping up in one of the regions your production app is running in.

charsleysa · May 11, 2022, 11:47pm

Hi @chris48s

This was the error that we were getting and turned out to be capacity issues in Sydney. This should be resolved now.

chris48s · May 12, 2022, 5:51pm

I’m still seeing this again tonight. I’ve also now seen this fail initiating the deploy locally so that rules out GitHub Actions as a factor.

Here’s the status of a VM that just failed. Our region is EWR:

$ flyctl vm status 7d42e865
Instance
  ID            = 7d42e865   
  Process       =            
  Version       = 139        
  Region        = ewr        
  Desired       = stop       
  Status        = failed     
  Health Checks =            
  Restarts      = 0          
  Created       = 1m15s ago  

Recent Events
TIMESTAMP            TYPE            MESSAGE                                                                                   
2022-05-12T17:47:12Z Received        Task received by client                                                                   
2022-05-12T17:47:12Z Task Setup      Building Task Directory                                                                   
2022-05-12T17:47:15Z Driver Failure  rpc error: code = Unknown desc = could not set bigger stdout pipe: cannot allocate memory 
2022-05-12T17:47:15Z Not Restarting  Error was unrecoverable                                                                   
2022-05-12T17:47:15Z Alloc Unhealthy Unhealthy because of failed task                                                          
2022-05-12T17:47:15Z Killing         Sent interrupt. Waiting 5s before force killing                                           
2022-05-12T17:47:15Z Killing         Sent interrupt. Waiting 5s before force killing                                           

Checks
ID SERVICE STATE OUTPUT 

Recent Logs
  2022-05-12T17:47:12.000 [info] Starting instance

eli · May 12, 2022, 8:31pm

Thank you for letting us know! We’ll take a close look at the ewr region to try to clear this up.

chris48s · May 24, 2022, 7:35pm

I’ve done 3 deploys in the last 2 weeks and this hasn’t recurred. Is it safe to assume this has been resolved?

Topic		Replies	Views
Not able to deploy	3	334	February 26, 2023
Deployments fail immediately due to unhealthy allocations	14	808	February 26, 2023
Persistent "unhealthy allocations" and "error registering 6pn service" errors on deploy or secret set Questions / Help	4	635	August 30, 2021
Help with deploy error: Failed due to unhealthy allocations Questions / Help	7	789	April 25, 2023
flyctl deploy: Reached heap limit Allocation failed - JavaScript heap out of memory JavaScript flyctl	6	58	March 2, 2025

rpc error: code = Unknown desc = could not set bigger stdout pipe: cannot allocate memory

Related topics