Can you please explain restart_limit
(from services.tcp_checks
) in the docs? The fly.toml page does not mention it.
It’s the number of times we’re try to restart your app after a crash before giving up. If it’s not set fly will try to restart i the app infinitely.
The docs now says “The number of consecutive TCP check failures to allow before attempting to restart the VM. The default is 0
, which disables restarts based on failed TCP health checks.”
Does this mean that checks do nothing by default? is this a good default?
I wouldn’t say the checks do nothing: if you do want a check to do nothing, you would not add one at all in your fly.toml
. That would be the do-nothing option. As Fly would not know to do any healthcheck, so it won’t.
Next, if you add a healthcheck but don’t specify restart_limit, well the app won’t deploy if that healthcheck fails. As that runs as part of the deploy. So that’s where it applies and does something.
And then if you specify a restart_limit value, you can choose how many failures to allow. Because some apps may be expected to fail maybe once or twice and not need a vm restart to resolve it. For example if they do external calls, which the vm being restarted would not fix. But e.g if it’s a nodejs crash, well in that case you would need the vm restarting on just one failure.
[[services]]
http_checks = []
internal_port = 8080
processes = ["app", "consumer"]
protocol = "tcp"
script_checks = []
[services.concurrency]
hard_limit = 25
soft_limit = 20
type = "connections"
[[services.ports]]
handlers = ["http"]
port = 80
force_https = true
[[services.ports]]
handlers = ["tls", "http"]
port = 443
[[services.tcp_checks]]
grace_period = "1s"
interval = "15s"
restart_limit = 0
timeout = "2s"
My restart_limit is 0 however in the logs I see the following:
2023-08-31T16:12:27.247 runner[…] fra [info] machine has reached its max restart count (10)
Why would it restart 10 times when 0 was specified?
This is specifically from an error being thrown in a node process
The restart_limit
setting only applies to V1 (Nomad) apps. The log refers to a “machine”, so your app is V2.
What you’re seeing is the result of the default Machine restart policy. The default is to keep attempting a restart up to 10 times after a failure. Here’s some info about Machine restart policies: Issues with machines restart policy - #25 by catflydotio
Thank you for providing that info!
Is it planned to support setting the new V2 app equivalent of that setting, the machine restart.policy
, via the fly.toml
config file? I think this would be really handy.