Restart machines when health check fails

Hey there!

From the docs, I fail to understand what exactly the health check is doing when it fails. Does it restart the machines? Also, I’m not sure what’s the difference between services.http_checks and http_service.checks - aren’t they the same?

I have a memory leak in my nodejs server (not my code base, so it’ll take a while to figure out) and every once in a while, the server hangs. I was wondering if I could use the health check to restart the machines, or shut them down and start new ones.

Thanks for any help!

I don’t have an answer to your healthcheck question, but maybe this will help anyway. You could cordon a machine that has been running for a week, then stop and start it, and then finally uncordon. That way no machine will be running for more than a week, and there will always be other machines running when you reboot each one.

Or you could add a process to do an internal health-check, and kill the main process. This will cause the container to die, and then as long as you have a restart policy of ‘always’ it will come back up. I should think you would still have to cordon/uncordon if you don’t want to drop traffic.