Health checks failing: error waiting for vsock readiness

nholden · January 6, 2023, 7:55pm

Hi!

Our Rails app is failing deploy health checks. I tried to deploy with fly deploy, the CLI successfully created a release, but while monitoring the deployment, I get back a message that looks like this:

 6 desired, 5 placed, 3 healthy, 1 unhealthy [health checks: 1 total, 1 passing]
--> v151 failed - Failed due to unhealthy allocations - rolling back to job version 150 and deploying as v152

When I ran fly vm status on one of the failed instances, I saw this:

Events
TIMESTAMP               TYPE            MESSAGE                                                                              
2023-01-06T19:07:40Z    Received        Task received by client                                                             
2023-01-06T19:08:15Z    Task Setup      Building Task Directory                                                             
2023-01-06T19:13:15Z    Alloc Unhealthy Task not running by deadline                                                        
2023-01-06T19:13:51Z    Killing         Sent interrupt. Waiting 5s before force killing                                     
2023-01-06T19:22:55Z    Driver Failure  rpc error: code = Unknown desc = error waiting for vsock readiness: context canceled
2023-01-06T19:22:55Z    Not Restarting  Error was unrecoverable                                                             
2023-01-06T19:22:59Z    Killing         Sent interrupt. Waiting 5s before force killing

The rollback also failed with similar messages (even though the original deployment to that version succeeded earlier today).

Our app seems to be running fine with no exceptions, and we haven’t made any recent configuration changes.

Any troubleshooting tips?

nholden · January 6, 2023, 10:19pm

I didn’t change anything, but deploys seem to be working again. Not sure what was going on.

kurt · January 7, 2023, 12:27am

This was most likely the result of a VM landing on a server under some load. It’ll clear itself up, but it is disruptive to deploys. It’s a known issue, should be fixed in the next few months.

nholden · January 7, 2023, 12:45am

Thanks, @kurt! Do you know if there’s anything we can do avoid this happening again in the meantime? It seems like it could be bad if we had an incident and were unable to deploy.

kurt · January 7, 2023, 12:56am

There’s no workaround for fly deploy yet. Machines don’t suffer from this in the same way, so if you want to take over the deploy logic and run your app on Machines, it will bypass some of the complexity that causes these kinds of problems. That’s not the easiest, though, there’s a lot of magic in fly deploy.

Topic		Replies	Views
Deploying Failing due to unhealthy allocations Build debugging	2	1165	January 30, 2023
deploys failing due to "unhealthy allocations" Questions / Help	4	2089	October 26, 2022
Fly deploy fails waiting for health checks suddenly - no configuration changes since last deploy Build debugging	35	1002	March 19, 2024
One region is pretty 3 versions behind	7	359	December 8, 2021
I'm unable to deploy my simple app : Health check on port 3000 has failed. Your app is not responding properly. Questions / Help postgres , rails	18	2070	May 3, 2023

Health checks failing: error waiting for vsock readiness

Related topics