Image aware placements

Machines with large images can really impact deployment times and the overall user experience. This is a huge problem for GPU use-cases as the image sizes can range anywhere from 2G to 100G+.

To address this, we are making our placement logic “image-aware”, which would work to prioritize placement based on whether the target image has already been cached on the host or not.

But it doesn’t stop there, a well crafted Docker image can have many layers but only a few change on application updates. Or think about the many applications that share base layers from images like Ubuntu’s, Nvidia’s, Go, Elixir, Python … and so many others public official images.

By making our machine placement logic aware of what layers are available across the fleet, we can reduce image pull times, and by extension, the time to launch your machines.

As a last note, FLAME pattern fans will get the benefits out of the box as their ephemeral machines will quickly pick the best host to start faster than ever before.

What do I need to do?

Nothing as long as you are using flyctl v0.2.20+

This is now enabled in all regions. We are monitoring how start times change for newly created machines after this changes, if you are interested to try them for your application no matter the region just let us know.

flyctl is ready to propagate the image hints on the following commands:

  • fly launch and fly deploy even when [[mounts]] section are set
  • fly scale count even when it has to create new volumes
  • fly volume fork will consider the image if the source volume is attached to a machine
  • fly postgres create will do the right thing too
  • fly machine commands family even when volumes are involved

What about machines that use volumes?

The “CreateVolume” API endpoint already accepts compute requirements hints (cpu, mem, …) that helps the placement logic pick a host with enough free compute capacity to run a machine with the volume attached.

We extended the compute requirements to also include an optional machine image field; machines are usually launched seconds after creating the volume, so it is likely the image for that machine is on the host ready to be used, meaning less wait time to start the machine and have the application serving requests or whatever.

In case you wonder, matching a host by image doesn’t mean it won’t respect the require-unique-zone constraint; volume placement always enforce constraints first and then score candidate hosts by available capacity.

How does it look for Volumes API

In case you are interacting directly with the API, here is how to pass the image hint.

curl -i -X POST "" \
  -H "Authorization: Bearer ${FLY_API_TOKEN}" \
  -H "Content-Type: application/json" \
      -d '{
      "name": "my_app_vol",
      "region": "ord",
      "size_gb": 10,
      "compute_image": "ollama/ollama:latest"

What metric should I be watching at?

We are working on adding more start time related machine metrics and expose that as application metrics, but for now you look for application logs like (scroll right if you can’t see the time it took to prepare the image):

[info]Pulling container image
[info]Successfully prepared image (88.648733ms)

And that is all for now!


Does this only work with images built with flyctl? Or is it available for images built with docker/build-push-action in GHA? In that case, I would like to see this on SIN and NRT!

It’ll work for images built with docker actions too.



To give some numbers on how much this can help – I have an Elixir app starting headless chrome, so the image is heavier than a stock 180mb Phoenix image at ~1GB. Under best-case conditions, a stock Phoenix image can be launched in 3.5-4.5s. For this app, I’m getting cold machine launches close to ideal numbers in ord and fra, even at 5x the image size:

ord: 4739ms
fra: 5363ms
syd: 6930ms
nrt: 7197ms

Updating to add that this is now enabled in all regions!

1 Like