When either (re)deploying or setting a new secret value for one of my apps, I’ve been consistently getting an unhealthy allocations failure like this one:
$ flyctl secrets set AUTH_TOKENS=REDACTED Release v35 created Monitoring Deployment 2 desired, 1 placed, 0 healthy, 1 unhealthy v35 failed - Failed due to unhealthy allocations - not rolling back to stable job version 35 as current job has same specification ***v35 failed - Failed due to unhealthy allocations - not rolling back to stable job version 35 as current job has same specification and deploying as v36
This happens whether deploying a new commit or the same commit that has been running without issue for many days now, so I don’t think this is actually an issue with health checks and I cannot see any problems in the app’s logs.
If I look at
flyctl vm status output for one of the failed instances, I see a Driver Failure rpc error: code = Unknown desc = error registering 6pn service Put “https://127.0.0.1:8501/v1/agent/service/register”: unexpected EOF error, like so:
flyctl vm status 9e7275c6 Instance ID = 9e7275c6 Task = Version = 35 Region = ewr Desired = stop Status = failed Health Checks = Restarts = 0 Created = 8m59s ago Recent Events TIMESTAMP TYPE MESSAGE 2021-08-29T01:35:40Z Received Task received by client 2021-08-29T01:35:40Z Task Setup Building Task Directory 2021-08-29T01:35:41Z Driver Failure rpc error: code = Unknown desc = error registering 6pn service Put "https://127.0.0.1:8501/v1/agent/service/register": unexpected EOF 2021-08-29T01:35:41Z Not Restarting Error was unrecoverable 2021-08-29T01:35:41Z Alloc Unhealthy Unhealthy because of failed task 2021-08-29T01:35:42Z Killing Sent interrupt. Waiting 5s before force killing Checks ID SERVICE STATE OUTPUT Recent Logs
FWIW, I tried
flyctl vm stop on each of the healthy instances I had running when I ran into this issue tonight, and new instances were able to start successfully and stay healthy.
The app in question, if it helps, is urlresolverapi-production.
I’m out of ideas at the moment, and hoping I haven’t just done something silly on my side here! I’m happy to provide any more context that would be useful in tracking this down.