Autostart for machines with health checks just got better

pavel · November 3, 2025, 9:01am

fly-proxy’s autostart and autostop features provide a nice way to keep your costs at bay. When your app doesn’t receive much traffic, the proxy automatically stops machines and when the traffic increases again the machines are started back.

Both features make use of soft_limit parameter - when an app as a whole is sufficiently below total soft_limit the proxy is allowed to stop a machine, and when all the running and healthy instances are above their soft_limit the proxy is allowed to start a machine. To start a machine an edge proxy just sends a request targeting a stopped machine to a worker host, and the proxy running on that worker host knows it needs to start it.

When the proxy starts a machine, it forwards the request to it as soon as a connection can be established. This works great for machines without health checks, as such machines are immediately available to serve other requests. For machines with health checks the situation is a bit different - if it takes some time for the health checks to pass, none of the edge proxies are allowed to send new requests to it until the health checks pass^[1]. If the rest of the healthy running machines are still above soft_limit, the proxy may send requests to another stopped machine, and then another, starting way more machines than needed. Not good!

So here are some recent changes that should improve this situation:

flyd, our virtual machines manager, now performs health checks with much shorter interval after a machine is started and until either the health check passes, or 2 configured health check intervals have elapsed. For the rest of machine’s runtime the configured interval is still used. This allows flyd to mark the machine as healthy much faster.
fly-proxy now keeps track of when machine is started and is allowed to route requests to machines above soft_limit if it detects a recently started machine with failing health checks. This should prevent the proxy from starting more machines than needed during autostart.

Let us know if you’ve noticed the old behavior and whether or not it’s working better for you now.

We made a conscious decision to ignore health check status for requests that caused the machine to start. This may very well change in the future.Footnotes ↩︎

rubys · November 3, 2025, 5:37pm

Yes, fly launch of Rails applications will bypass thruster as previously the first request after a machine was restarted would fail as thruster would be up but the rails app would not be. This could now be corrected.

pavel · November 4, 2025, 9:32am

Not yet. The proxy still ignores health checks for the requests that caused a machine to start and forwards the request as soon as it’s able to establish a connection.

This fresh produce is about correcting proxy’s behavior to make sure it doesn’t start more machines than needed due to health checks taking too much time to pass.

Hypermind · November 4, 2025, 12:56pm

the proxy may send requests to another stopped machine, and then another, starting way more machines than needed. Not good!

Thank you for improving that, the previous behavior was unnerving.