New feature in preview: suspend/resume for Machines

:warning: Update as of 24 July 2024: this feature is now enabled in all regions, and autosuspend has been released!

We know that making your Machines boot faster matters! After all, every second spent starting up your app is another second that your users have to spend waiting. Fly Machines already boot pretty quickly—fast enough to make automatic starts and stops work effectively. But still, it can easily take two seconds for a Machine running a typical Rails app to go from the stopped state to being ready to handle HTTP requests. It’s not an eternity, but there’s a lot of room for improvement.

Some of you may know that the hypervisor we use, Firecracker, allows you to “snapshot” a virtual machine. This means pausing it and dumping all of its state (including its memory) to persistent storage. Later, you can load the snapshot back into Firecracker, and your virtual machine will resume exactly where it left off, as if nothing had happened. (It’s even possible for network connections to stay intact if the other side doesn’t close them!) There’s no need to boot the Linux kernel or start up your app’s runtime, meaning that it can be much faster than rebooting.

We’re implementing this for Fly Machines! You’ll be able to suspend a Fly Machine, rather than stop it, and the next start will resume the Machine from the snapshot taken. Consequently, your app can be ready to serve new requests within a few hundred milliseconds.

We’re still iterating on this feature, but we’re excited to tell you that we’ve enabled it for you to try in a handful of regions to start:

  • Bogotá, Colombia (bog)
  • Guadalajara, Mexico (gdl)
  • Johannesburg, South Africa (jnb)
  • Bucharest, Romania (otp)
  • Phoenix, Arizona, United States (phx)

Suspending a Machine

From the CLI

There’s a new command introduced in flyctl v0.2.71 (released June 17):

fly machines suspend <ID>

Now, if you run fly machines status <ID>, you’ll either see that the Machine is in the suspended state, or perhaps that it is still in the process of suspending.

From the Machines API

Send a POST request to the new Machine suspension endpoint, documented here.

Calling this endpoint kicks off the suspension process, but it might take a few seconds to complete. The wait-for-state endpoint now accepts suspended as a target state if you’d like to wait for it to finish.

Resuming a suspended Machine

This one’s easy—start it as usual with fly machines start <ID> or the Machines API’s start endpoint. Machines in the suspended state will attempt to resume from a snapshot, and will fall back to a cold start if for some reason this isn’t possible.

Additionally, if you have automatic start enabled, then Fly Proxy will resume your suspended Machines when they are needed to handle incoming requests.

Forcing a suspended Machine to do a cold boot

You can use fly machines stop <ID> or the Machines API’s stop endpoint to convert a suspended Machine into a stopped one. The Machine’s snapshot will be thrown away, and the Machine will have to do a cold start the next time that it’s started.

Updating a suspended Machine

When you deploy your app, suspended Machines are treated as if they are stopped. Their snapshots are thrown away, and they’ll be cold-started with the updates that you’ve made.

:warning: Important: snapshots are disposable

We do not guarantee that a suspended Machine will ever resume from its snapshot; it’s possible that it will perform a cold start instead. For example, this may happen when we have to migrate a Machine to a different host to find space for it to run.

We do ensure that if a Machine performs a cold start, than any existing snapshot is invalidated. Put another way, a Machine cannot “go back in time” by resuming from a snapshot made before it last did a cold start.

We also ensure that a Machine cannot resume from a single snapshot more than once. Believe it or not, this is actually a security consideration! You can read the technical details over in Firecracker’s documentation.

Current limitations and caveats

There are some restrictions on what Machines can be suspended:

  • To be suspended, your Machines must have been updated since 20 June 2024 at 20:00 UTC. Don’t worry too much—it’ll tell you if this isn’t the case! Use fly machines update --yes <ID> or re-deploy your app if you run into this.
  • Machines must have 2 GiB of memory or less.
  • Machines must not have swap enabled.
  • Machines must not have a schedule.
  • Machines with GPUs cannot be suspended.

Furthermore, there are some rough edges to be aware of:

  • There is no “auto-suspend” feature analogous to auto-stop yet.
  • You will lose some log lines after a Machine is resumed.
  • When resumed, your Machine may take a few seconds to update its clock, so for the first few seconds it will think that it’s in the past.

We hope to address these soon!

Billing

For now, suspended Machines are billed just like stopped Machines.


Let us know here if you have any questions. We’re excited to see what you’ll build with this!

17 Likes

How long will we have to wait for this feature to replace auto-start/stop?

1 Like

As soon as we can do so safely. There’s a few things we need to fix and test before the proxy can use it reliably.

2 Likes

If this was possible would be amazing, auto-stop gpu machines are quite slow because loading the models in gpu take easily 30-40 seconds, if we could cut that time to a few seconds would be just incredible.

4 Likes

This feature would be a game changer for GPU machines, currently loading models in PyTorch is very slow for example

You could even use DMA to load GPUs memory directly from disk

Noted. We use cloud hypervisor instead of firecracker on our GPUs, so it’s not a matter of just enabling it. We’ll see what we can do.

3 Likes

+1 for similar feature on GPU VMs. I’m scoping a solution for a product where this would be a killer feature.

Is this reflected in the UI or via CLI somehow?

I updated my auto_stop_machines = true to auto_stop_machines = "suspend"

But when I click into view a individual machine, I see only stopped listed under state for any individual machine.

Yea, should be a blue dot for Suspended. Click on the “Configuration” tab to ensure it’s properly set.

Yeah it is.

Some are now showing as suspended. Some still show stopped, but maybe they didn’t get traffic to put themselves in a suspended state.

Thanks.

TL;DR: Is there any documentation on leveraging this within apps (e.g., on the development side rather than operations side)?

Context: I searched for more documentation but couldn’t find any. I looked at AWS Lambda SnapStart, but it looks like they have a specific SDK to support this (which only works with Java).

  • Is there a way for the app to tell that it has been resumed?
  • Should I code the app with an initial “sleep” so that I have enough time to snapshot it?

For example, the app I built requires fetching a URL before it starts. The URL is different for each invocation and is supplied via environment variables.

Is there a maximum time a machine will remain suspended before being stopped? I’d love to use that feature for machines that might need to stay suspended for a few days.

@amo there’s no fixed time limit for suspended Machines. I’ve already had some test Machines in the suspended state for several days without issue.

To be sure, a host maintenance or capacity issue may invalidate a suspended Machine’s snapshot/force-stop it. If several days pass between a Machine’s suspension and its next start, then there’s a higher probability that such an issue will occur in that time than if the interval is only, say, an hour. I’d still expect these to be unlikely events, though!

1 Like

Hi, any updates on GPUs + suspend? Currently the limitation is 2GB of RAM, I wonder how that would work when the min GB required by the GPU is 16GB?

Also, currently the fly deploy doesn’t complain and deploy successfully if you set auto_stop_machines = "suspend" with attached GPUs - shouldn’t it error?

2 Likes

Need update on this. If not, the official doc should mention this feature is not available for GPU machines.

For GPU machines it would be useful to enable this feature even if you can’t snapshot the GPUs memory, the code would load the model both in RAM and vRAM, when the machine is awaken again you would transfer the model weights to the GPU, which should be faster than loading it from disk.

Modal does something like this and cold start can improve by 1.5-3x: Memory Snapshot (beta) | Modal Docs

The code would need to know when the machine is resumed so it can move the model back to the GPUs using mode.to(‘cuda’) or similar