New v2 apps created by fly launch
(v0.0.528+) will launch enough machines to provide increased availability in case of hardware failure, all of this while keeping costs down (even lower than before).
The key features we are putting together are automatically starting and stopping machines when the load goes up or down (more at this post), and a fresh new thing we are introducing today called standby machines.
What are Standby Machines?
In short, standby machines are stopped machines that will be started only if the machines they are watching for become unavailable due to a hardware failure.
That is, the standby machine will be dormant, not consuming resources, not adding costs, watching and waiting until its primary have a serious host problem like a disk failure or power outage. Only then, it will wake up and take over.
UPDATE (after comments):
- Standby machines and autostart/autostop features are not related.
- They are used together to improve availability but the later doesn’t imply the former and vice versa.
- In the presence of “services”,
fly launch
won’t create standby machines ever. - Services always use normal machines and Fly Proxy will only control their state (stopped/started) if you enable autostart/autostop flags on the service
- All of this is per process group (that section named
[processes]
in fly.toml). If missing, it is implied that you have only one process group namedapp
Why now?
This comes in response to recent reliability issues, the goal is to minimize the impact on applications when hardware failures take down nodes across our fleet.
We believe that by combining the new Fly Proxy powers to automatically start and stop machines when there are services involved, with standby machines when machines are out of Fly Proxy reach, the general resilience of apps to hardware failure will improve substantially.
How does it work?
When deploying your application for the first time, we take the following actions:
- Start 2 machines for process groups with services, but also enable auto start/stop to scale them automatically and save costs
- Start one always-on and one standby machine for process groups without services. Stopped machines don’t add to the bill.
- No matter what, start only 1 machine if the process group have mounts
Confused? Let see an example
app = "myapp"
[processes]
app = ""
disk = "sleep inf"
task = "sleep inf"
[[mounts]]
source = "disk"
destination = "/data"
processes = ["disk"]
[http_service]
internal_port = 80
force_https = true
auto_stop_machines = true
auto_start_machines = true
processes = ["app"]
This application has 3 process groups:
- “app” group serves a http service
- “task” group have no mounts nor services
- “disk” group have mounts (volumes attached)
Once deployed it will create 5 machines, 2 for “app”, 2 for “task” and 1 for “disk”.
See the output of fly status
a few minutes after launching
$ fly status
App
Name = myapp
Owner = personal
Hostname = myapp.fly.dev
Image = library/nginx:latest
Platform = machines
Machines
PROCESS ID VERSION REGION STATE CHECKS LAST UPDATED
app 1781329f13e289 1 iad stopped 1 total 2023-04-19T18:11:39Z
app e784ee77c47e68 1 iad stopped 1 total 2023-04-19T18:11:05Z
disk 32874572ae2e38 1 iad started 2023-04-19T18:10:56Z
task 3d8d501f724289 1 iad started 2023-04-19T18:11:01Z
task†e784ee79f41378 1 iad stopped 2023-04-19T18:11:05Z
†Standby machine (it will take over only in case of host hardware failure)
Note how all machines in the “app” group were stopped by Fly Proxy due to the lack of requests going into the http service.
Similarly, “task” machine with id e784ee79f41378
is in stopped
state, and it was never started, because it is a standby machine for 3d8d501f724289
which is started
and running healthy. In case the host of the later have a hardware failure, the former will take its place.
Oh, worth pointing out that “disk” machine is on its own. It is not safe to run two machines for a stateful group so we don’t do it. flyctl won’t create more than one machine by default but you can with fly machine clone
.
That’s all folks.
Happy HA setup to everyone!