Issues with machines restart policy

There are three policies (using the names of the options in fly m update and fly m run help):

no - always let a Machine stop when its main process exits, whether that’s on purpose or on a crash

always - never let a Machine enter the stopped state, even if your main process exits cleanly

on-fail - try up to 10 times to restart the Machine if it exits with a non-zero exit code, before letting it stop. This is the default behaviour if the policy is not specified.

Right now, the default on fly machine run is always. Fly Postgres apps are configured with a policy of always, too.

The default for a new Fly App (V2 of course, since V1 doesn’t use Machines) is an empty policy, equivalent to on-fail. This lets Machines be restarted if they crash, and allows your app Machines to effectively scale down by exiting cleanly, but:

At the moment:

If Fly Proxy can wake it on request (regular web app Machine with services configured), or you do active scaling via the API, the default on-fail policy should be good.

For an always-on app with no services: host reboots could stop Machines. If you want to set and forget, always is the best policy.

Right now, fly machine update (or the Machines API) is the way to change the policy. It doesn’t exist in app-wide configuration yet, so fly.toml won’t help.

6 Likes