New v2 apps created by
fly launch (v0.0.528+) will launch enough machines to provide increased availability in case of hardware failure, all of this while keeping costs down (even lower than before).
The key features we are putting together are automatically starting and stopping machines when the load goes up or down (more at this post), and a fresh new thing we are introducing today called standby machines.
In short, standby machines are stopped machines that will be started only if the machines they are watching for become unavailable due to a hardware failure.
That is, the standby machine will be dormant, not consuming resources, not adding costs, watching and waiting until its primary have a serious host problem like a disk failure or power outage. Only then, it will wake up and take over.
UPDATE (after comments):
- Standby machines and autostart/autostop features are not related.
- They are used together to improve availability but the later doesn’t imply the former and vice versa.
- In the presence of “services”,
fly launchwon’t create standby machines ever.
- Services always use normal machines and Fly Proxy will only control their state (stopped/started) if you enable autostart/autostop flags on the service
- All of this is per process group (that section named
[processes]in fly.toml). If missing, it is implied that you have only one process group named
This comes in response to recent reliability issues, the goal is to minimize the impact on applications when hardware failures take down nodes across our fleet.
We believe that by combining the new Fly Proxy powers to automatically start and stop machines when there are services involved, with standby machines when machines are out of Fly Proxy reach, the general resilience of apps to hardware failure will improve substantially.
When deploying your application for the first time, we take the following actions:
- Start 2 machines for process groups with services, but also enable auto start/stop to scale them automatically and save costs
- Start one always-on and one standby machine for process groups without services. Stopped machines don’t add to the bill.
- No matter what, start only 1 machine if the process group have mounts
Confused? Let see an example
app = "myapp" [processes] app = "" disk = "sleep inf" task = "sleep inf" [[mounts]] source = "disk" destination = "/data" processes = ["disk"] [http_service] internal_port = 80 force_https = true auto_stop_machines = true auto_start_machines = true processes = ["app"]
This application has 3 process groups:
- “app” group serves a http service
- “task” group have no mounts nor services
- “disk” group have mounts (volumes attached)
Once deployed it will create 5 machines, 2 for “app”, 2 for “task” and 1 for “disk”.
See the output of
fly status a few minutes after launching
$ fly status App Name = myapp Owner = personal Hostname = myapp.fly.dev Image = library/nginx:latest Platform = machines Machines PROCESS ID VERSION REGION STATE CHECKS LAST UPDATED app 1781329f13e289 1 iad stopped 1 total 2023-04-19T18:11:39Z app e784ee77c47e68 1 iad stopped 1 total 2023-04-19T18:11:05Z disk 32874572ae2e38 1 iad started 2023-04-19T18:10:56Z task 3d8d501f724289 1 iad started 2023-04-19T18:11:01Z task† e784ee79f41378 1 iad stopped 2023-04-19T18:11:05Z † Standby machine (it will take over only in case of host hardware failure)
Note how all machines in the “app” group were stopped by Fly Proxy due to the lack of requests going into the http service.
Similarly, “task” machine with id
e784ee79f41378 is in
stopped state, and it was never started, because it is a standby machine for
3d8d501f724289 which is
started and running healthy. In case the host of the later have a hardware failure, the former will take its place.
Oh, worth pointing out that “disk” machine is on its own. It is not safe to run two machines for a stateful group so we don’t do it. flyctl won’t create more than one machine by default but you can with
fly machine clone .
That’s all folks.
Happy HA setup to everyone!