But still, I would like to keep always at least one machine instance running. My use case: Hosting a website that needs to load super fast (so no cold start up wanted).
Ideally I would like to have a setup like this:
One V2 app, one group with two - four machines in that group.
At least one machine always runs to handle spontaneous traffic.
The other machines can be auto started to handle traffic spikes.
Currently there is no way to set a minimum number of instances. We don’t have any immediate plans to support the feature. It’s something we’ve considered but weren’t sure there’d be demand. We’re happy to look into supporting it if there is.
I personally would like to have such a feature. It would bring the autoscaling capabilities of V1 apps (Scale V1 (Nomad) Apps · Fly Docs) to V2 apps.
“min” could be defined in the toml, and “max” would be implicitly defined by the number of machines.
Not knowing the internal architecture and potential challenges, I could imagine the following “solutions” for the “problem” based on these Docs :
Explicitly defining a min-machines number in the fly.toml.
Giving users the possibility to alter the auto stop “strategy”:
If there’s more than one Machine in the region:
the proxy determines how many Machines are over their soft_limit setting and then calculates excess capacity: excess capacity = num of machines - (num machines over soft limit + 1)
if excess capacity is 1 or greater, then the proxy stops one machine
If there’s only one Machine in the region:
the proxy checks if the Machine has any traffic
if the Machine has no traffic (a load of 0), then the proxy stops the Machine
Basically, a user would need to be able to change:
a) the value 1 in “if excess capacity is 1 or greater, then the proxy stops one machine”
b) the “one-machine-left-stop”-criterion: "no traffic/no load" | "never"
Marking individual machines as “static” respectively “non-auto-stoppable”. They would not be part of the auto-stop-evaluation-logic.
I’ve observed that when deploying Dockerized a static single executable in a programming language like OCaml, Rust, or Go, so far it starts in double digits of milliseconds, and the cost of cold-starting all machines in such cases is almost zero, whereas when cold-starting something like a large size Node.js application, it’s incredibly slow. This idea would be really useful in a case like this. It’s possible to keep at least one machine hot by constantly kicking the health check endpoint on a machine with the smallest shared CPU, but I’d really like to see such a great functionality implemented, because it would definitely mess up Fly’s great developer experience.