Asynchronous machine autostart in fly-proxy

pavel · December 22, 2025, 10:35am

Whenever fly-proxy decides to start a machine, it does so in context of a request. The request is forwarded to a worker node that hosts the machine, the proxy on that node instructs flyd to start the machine and waits for the machine to boot and become healthy. Once the machine is healthy the request is finally forwarded to it.

There is a limit to how long the proxy is willing to wait for a machine to become healthy after it started it. After all, the app may not become healthy at all and the request will need to be retried on a different instance.

Some apps take quite a long time to boot and may trigger this timeout even though they eventually become healthy and can handle the requests. For such apps, this frequently leads to proxy starting more machines than needed, as requests get retried on other stopped machines. For requests with large enough bodies this may even result in 5xx errors returned to the users in case proxy can no longer retry the request due to its body size.

To avoid this situation the proxy now attempts to start machines asynchronously. Whenever the proxy decides it needs to scale an app up and start a new machine, it will request a machine to be started from a worker node, but forward the request to an already running healthy machine (even if it’s above soft_limit). Coupled with the recent improvement to how the proxy treats recently started machines with failing health checks during balancing, this change should prevent the proxy from starting more machines than needed, provided the machine becomes healthy while it’s still considered “recent” (right now the time limit for this is 20 seconds, but it may change in the future).

Keep it mind that the new behavior is applied only if there are running healthy machines during balancing. For apps that scale to 0, the first machine will still be started in context of a request.

charsleysa · December 22, 2025, 10:33pm

So no change in behavior for apps that are at 0 scale?

pavel · December 23, 2025, 10:05am

Right now - no. If there are no running instances of an app, a request is sent to one of the stopped machines as before.

I’m considering enabling async autostart for 0 scale as well (e.g. letting the balancer retry while waiting for the first machine to start), but I need to be sure first that we aren’t introducing too much additional latency for the apps the do start fast.

charsleysa · December 23, 2025, 8:26pm

What about having a config so if you know your app takes a little longer to startup the balancer isn’t constantly retrying?