Will a node be taken out of commission?
If a health check is failing but the application still runs, you can configure the number of times it will be restarted before being rescheduled with restarts = 6
(or some other number. After the restart_limit is hit it’s taken out of commission.
If your app’s startup time is getting in the way of your health checks, then you can use the grace_period
field to have it wait a number of seconds before doing the health check. You can also use restart_limit = 0
to keep it up (this is the fly.toml default)
If your app is crashing in a way that the health checks aren’t triggered, then we’ll restart it with exponential backoff between restarts. Here’s a thread with a great explanation. Note that backoff behavior is something that may change a bit as we build out fly machines
for more orchestration stuff.