Cloning and resuming a suspended machine?

Hi!

I would like to know the best way to use the fly machines api to build a sandbox for my AI generated code/ui. For that, I need to be able to start up a new machine on demand from a docker image as fast as possible.

I have tried to use the fly machine clone command for that, however it tends to be slow, specially when pulling the images from fly docker registry, it sometimes takes up to 20 seconds to do that despite my image weighting only 300mb.

However, I have seen that you can now suspend fly VMS and start them up again later and was wondering if it would be possible to combine that with the machine clone command. Maybe the feature is already implemented and I have not found it yet, but it looks like this does not work.

Any tips to reduce machine startup times?

Could you list here the command sequence you’re using? Machine starts should be much faster than that. What region are you in?

Yes. If you find that start times of 1-2 seconds is not fast enough, you could create and feed a pool of suspended machines, and then allocate one to a user and start it when you need one. All of this can be controlled via the REST API.

You mentioned that you’re cloning a machine; could you just try creating one? Is there something you have in a running machine that could not be covered by the appropriate image?

(I could see why a clone would be slower than a create, but not to the tune of 20 seconds. That definitely sounds like something isn’t going right.)

1 Like

Here are the logs of:

fly machine run registry.fly.io/rebolt-sandbox:deployment-01JWVG3PAYHDP2H920N3PVMFDX --app “rebolt-sandbox”

2025-06-03T21:10:00Z runner[e822441a7e9148] cdg [info]Pulling container image registry.fly.io/rebolt-sandbox@sha256:f353766da39afcd515f1f1010d2e3e3ac2d0a60e8b975a91025d546672e71050
2025-06-03T21:10:11Z runner[e822441a7e9148] cdg [info]Successfully prepared image registry.fly.io/rebolt-sandbox@sha256:f353766da39afcd515f1f1010d2e3e3ac2d0a60e8b975a91025d546672e71050 (11.92693465s)
2025-06-03T21:10:12Z runner[e822441a7e9148] cdg [info]Configuring firecracker
2025-06-03T21:10:12Z app[e822441a7e9148] cdg [info]2025-06-03T21:10:12.714408040 [01JWVT2BY6A05ERD26MJ24B918:main] Running Firecracker v1.7.0
2025-06-03T21:10:13Z app[e822441a7e9148] cdg [info] INFO Starting init (commit: 28e33be24)...
2025-06-03T21:10:13Z app[e822441a7e9148] cdg [info] INFO Preparing to run: `/docker-entrypoint.sh` as root
2025-06-03T21:10:13Z app[e822441a7e9148] cdg [info] INFO [fly api proxy] listening at /.fly/api
2025-06-03T21:10:13Z runner[e822441a7e9148] cdg [info]Machine created and started in 13.578s

Here it only took 13.578s to start the machine, but it is not weird to see them take more than 20 seconds to start. @halfer

I just ran an experiment to start a small machine based on an image already in the Fly Docker registry:

halfer@halfer-VirtualBox:~/$ time flyctl machine create --config machines/fly.toml registry.fly.io/sequoia-browser-test:deployment-01JW9M311T30SDR1JQC8YNXXXX
Searching for image 'registry.fly.io/sequoia-browser-test:deployment-01JW9M311T30SDR1JQC8YNXXXX' remotely...
image found: img_2wokpy3glw9jxxxx
Image: registry.fly.io/sequoia-browser-test:deployment-01JW9M311T30SDR1JQC8YNXXXX@sha256:ab45d8d0cd406a7569d5965c070aae0dfc50b5ed18474836ae3a740abef0xxxx
Image size: 117 MB

Success! A Machine has been successfully created in app sequoia-browser-test
 Machine ID: 90802d79f1xxxx
 Instance ID: 01JWVTWXYTAEBQP55ND16RXXXX
 State: created

real	0m5.071s
user	0m0.163s
sys	0m0.053s

5 seconds is longer than I was expecting, but to be fair that’s from a creaky laptop on a domestic internet connection. If you do this in the cloud, it will be much faster.

I have tried to both create a new machine and to clone an existing one. In both cases it usually takes around 15 seconds to build the machine from the image and another 2-3 seconds to start it:

2025-06-03T21:30:31Z runner[e286e276fe5708] cdg [info]Pulling container image registry.fly.io/rebolt-sandbox@sha256:f353766da39afcd515f1f1010d2e3e3ac2d0a60e8b975a91025d546672e71050
2025-06-03T21:30:45Z runner[e286e276fe5708] cdg [info]Successfully prepared image registry.fly.io/rebolt-sandbox@sha256:f353766da39afcd515f1f1010d2e3e3ac2d0a60e8b975a91025d546672e71050 (14.067789725s)
2025-06-03T21:30:46Z runner[e286e276fe5708] cdg [info]Configuring firecracker
2025-06-03T21:30:46Z app[e286e276fe5708] cdg [info]2025-06-03T21:30:46.968156007 [01JWVV7Z1DQ8FSDSWYPTQZC87Q:main] Running Firecracker v1.7.0
2025-06-03T21:30:47Z app[e286e276fe5708] cdg [info] INFO Starting init (commit: 28e33be24)...
2025-06-03T21:30:48Z app[e286e276fe5708] cdg [info] INFO Preparing to run: `/docker-entrypoint.sh` as root
2025-06-03T21:30:48Z app[e286e276fe5708] cdg [info] INFO [fly api proxy] listening at /.fly/api
2025-06-03T21:30:48Z runner[e286e276fe5708] cdg [info]Machine created and started in 16.146s

I wonder if there is a way to cache the prepared image so it is not recreated for every single machine, that would save a lot of time.

Most of that operation is an image pull; a bit over 14 seconds. How big is the image? It does not look like it is rebuilding it.

Would you share here the exact command you’re using please, as I have done? We probably need to ensure we’re comparing like with like.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.