Update as of 24 July 2024: this feature is now enabled in all regions, and autosuspend has been released!
We know that making your Machines boot faster matters! After all, every second spent starting up your app is another second that your users have to spend waiting. Fly Machines already boot pretty quickly—fast enough to make automatic starts and stops work effectively. But still, it can easily take two seconds for a Machine running a typical Rails app to go from the stopped
state to being ready to handle HTTP requests. It’s not an eternity, but there’s a lot of room for improvement.
Some of you may know that the hypervisor we use, Firecracker, allows you to “snapshot” a virtual machine. This means pausing it and dumping all of its state (including its memory) to persistent storage. Later, you can load the snapshot back into Firecracker, and your virtual machine will resume exactly where it left off, as if nothing had happened. (It’s even possible for network connections to stay intact if the other side doesn’t close them!) There’s no need to boot the Linux kernel or start up your app’s runtime, meaning that it can be much faster than rebooting.
We’re implementing this for Fly Machines! You’ll be able to suspend a Fly Machine, rather than stop it, and the next start will resume the Machine from the snapshot taken. Consequently, your app can be ready to serve new requests within a few hundred milliseconds.
We’re still iterating on this feature, but we’re excited to tell you that we’ve enabled it for you to try in a handful of regions to start:
- Bogotá, Colombia (
bog
) - Guadalajara, Mexico (
gdl
) - Johannesburg, South Africa (
jnb
) - Bucharest, Romania (
otp
) - Phoenix, Arizona, United States (
phx
)
Suspending a Machine
From the CLI
There’s a new command introduced in flyctl v0.2.71 (released June 17):
fly machines suspend <ID>
Now, if you run fly machines status <ID>
, you’ll either see that the Machine is in the suspended
state, or perhaps that it is still in the process of suspending
.
From the Machines API
Send a POST request to the new Machine suspension endpoint, documented here.
Calling this endpoint kicks off the suspension process, but it might take a few seconds to complete. The wait-for-state endpoint now accepts suspended
as a target state if you’d like to wait for it to finish.
Resuming a suspended Machine
This one’s easy—start it as usual with fly machines start <ID>
or the Machines API’s start endpoint. Machines in the suspended
state will attempt to resume from a snapshot, and will fall back to a cold start if for some reason this isn’t possible.
Additionally, if you have automatic start enabled, then Fly Proxy will resume your suspended
Machines when they are needed to handle incoming requests.
Forcing a suspended Machine to do a cold boot
You can use fly machines stop <ID>
or the Machines API’s stop endpoint to convert a suspended
Machine into a stopped
one. The Machine’s snapshot will be thrown away, and the Machine will have to do a cold start the next time that it’s started.
Updating a suspended Machine
When you deploy your app, suspended
Machines are treated as if they are stopped
. Their snapshots are thrown away, and they’ll be cold-started with the updates that you’ve made.
Important: snapshots are disposable
We do not guarantee that a suspended Machine will ever resume from its snapshot; it’s possible that it will perform a cold start instead. For example, this may happen when we have to migrate a Machine to a different host to find space for it to run.
We do ensure that if a Machine performs a cold start, than any existing snapshot is invalidated. Put another way, a Machine cannot “go back in time” by resuming from a snapshot made before it last did a cold start.
We also ensure that a Machine cannot resume from a single snapshot more than once. Believe it or not, this is actually a security consideration! You can read the technical details over in Firecracker’s documentation.
Current limitations and caveats
There are some restrictions on what Machines can be suspended:
- To be suspended, your Machines must have been updated since 20 June 2024 at 20:00 UTC. Don’t worry too much—it’ll tell you if this isn’t the case! Use
fly machines update --yes <ID>
or re-deploy your app if you run into this. - Machines must have 2 GiB of memory or less.
- Machines must not have swap enabled.
- Machines must not have a schedule.
- Machines with GPUs cannot be suspended.
Furthermore, there are some rough edges to be aware of:
- There is no “auto-suspend” feature analogous to auto-stop yet.
- You will lose some log lines after a Machine is resumed.
- When resumed, your Machine may take a few seconds to update its clock, so for the first few seconds it will think that it’s in the past.
We hope to address these soon!
Billing
For now, suspended
Machines are billed just like stopped
Machines.
Let us know here if you have any questions. We’re excited to see what you’ll build with this!