Instance or service not restarted when I expected it to

wjordan · July 26, 2022, 5:19am

One of our hosts in the sjc region triggered a bug in Nomad (our VM-orchestration service) which was preventing some instances from transitioning correctly, which may have been why your instance was stuck in ‘pending’. We restarted the service on the affected host which unblocked the stuck instances. If this was the issue, things should be cleared up by now and let us know if you continue to see any unexpected behavior.

As for the restart policy, note that the restart_limit setting only configures restarts triggered by health-check failures. Application-process crashes (including OOM-triggered exits) are triggered by a separate internal (not configurable) restart policy. The current policy <checks notes> will restart any exited processes up to 2 times within 5 minutes, then re-deploys the instance on another host if the process exits again. If the new deploy continues to fail, the instance will continue to get re-deployed indefinitely, with an exponential delay between 15 seconds - 15 minutes, capped at 15 restarts every 2 hours.

This is all very tied to Nomad’s built-in restart behavior, so the exact restart-policy details may change with Machine-based apps.

Topic		Replies	Views
Cause of instance restart unclear	14	1166	December 11, 2020
scale count 15 but eventually no instances running (503 error) Questions / Help docs	2	609	December 16, 2022
App Shutting Down and won't restart?	4	779	June 27, 2022
No suitable (healthy) instance found to handle request	9	330	October 28, 2021
Unexpected Restarts metrics	3	753	September 17, 2020

Instance or service not restarted when I expected it to

Related topics