[experimental] Speedy machine creation with overlaybd

Many docker images are able to start doing useful work before the entire image is downloaded. If you have a particularly large image (e.g. >1GB, or if you’re running ML workloads), this can have a real impact on how you decide to architect your application.

We’ve been working on a way to make new machine creation faster, using something called overlaybd. overlaybd is new container image format that allows us to delay downloading parts of the docker image until they are actually needed. When converting a docker image to this new format, each layer is turned into a block device that stores the changed blocks from the previous layer. overlaybd then merges the layers together, and can randomly access individual blocks in the remote container registry. It runs as a userspace daemon, exposing a virtual block device via TCMU, and only downloads blocks when they are read. We pass it along to firecracker as the rootfs for the machine.


To try out lazy-loaded images:

  1. Make sure you are on flyctl >= v0.2.25
$ fly version
fly v0.2.25 darwin/arm64 Commit: 358c5fcbcbf3b9edfab38ba8f5e305c8b786231e BuildDate: 2024-03-26T04:41:20Z
  1. Delete your remote builder app, as we have released a new version.
$ fly apps list | grep builder
fly-builder-bitter-snow-4886    	personal      	suspended	                 	
$ fly apps destroy fly-builder-bitter-snow-4886
  1. Enable lazy-loaded images in your fly.toml
[experimental]
lazy_load_images = true
  1. Run fly deploy
  2. Watch fly logs to see how long it takes to start the machine

This will probably work best for you if you have a large image (>1GB), and you care about machine creation time (e.g. if you maintain a warm pool of machines). The less amount of data you need to start up, the faster it should be.

This is an experimental, alpha-level feature. Once you try it out, we’d really like to know if it works for you, if you find any bugs, and how it performs with your docker images.

14 Likes

@ben-io hey, crazy idea but, any chance this would allow support for 60GB docker images?

Actually, it probably does already. What’s making your image so large? Note that if you are running a GPU machine we give you a 50GB rootfs instead of 8GB, which is where the image size limitation comes from.

1 Like

Just been exploring preloading models into the docker image at build time so that they’re ready to go after machine startup.

I’ve been looking at possibly using ollama with mixtral preloaded and avoiding the need to create volumes for each machine.

1 Like

@ben-io I’m trying to use this option for deploying an ollama image with llama3 preloaded (around 9.5GB) but I keep getting the following error:

time="2024-05-01T06:05:06Z" level=error msg="failed to build overlaybd: failed to build \"sha256:05a88d426c377f7de36ce9e541c7b1edb6899b9070dab84b6d5da514fd61fc24\": failed to build tmp_conv/1--05a88d426c377f7de36ce9e541c7b1edb6899b9070dab84b6d5da514fd61fc24: write tmp_conv/1--05a88d426c377f7de36ce9e541c7b1edb6899b9070dab84b6d5da514fd61fc24/0003_sha256:a4f2f307d81738dbd6dfeedfc150a3cb2235ed3f04823a6641a5089f44cf439c/layer.tar: no space left on device"

Looks like you ran out of space on the builder. Try expanding the volume:

  1. Find the builder app
❯ fly apps list | grep builder
fly-builder-broken-haze-2872                                                          	personal      	suspended	                 	
  1. Find the builder volume
❯ fly vol list -a fly-builder-broken-haze-2872
ID                  	STATE  	NAME        	SIZE	REGION	ZONE	ENCRYPTED	ATTACHED VM   	CREATED AT     
vol_v8mo70jjey6dndlr	created	machine_data	 5GB	iad   	de98	true     	78165d2b544968	24 minutes ago	
  1. Expand it
❯ fly vol extend -a fly-builder-broken-haze-2872 vol_v8mo70jjey6dndlr -s 30

@ben-io I tried extending the volume to 80GB but it still doesn’t work, and the fly dashboard says only 12GB of the volume is in use. I tried deleting the builder and running it again but still the same issue.

I tried running the deploy locally but it looks like it doesn’t support creating overlaybd images locally.

Here’s the last few log lines of my last attempt:

2024-05-01T22:36:37.706 app[17816475ce5078] syd [info] time="2024-05-01T22:36:37.706618135Z" level=info msg="fdaa:0:7e:a7b:9076:0:a:800 - - [01/May/2024:22:36:37 +0000] \"GET /flyio/v1/extendDeadline HTTP/1.1\" 202 0"

2024-05-01T22:36:38.348 app[17816475ce5078] syd [info] time="2024-05-01T22:36:38Z" level=error msg="failed to build overlaybd: failed to build \"sha256:4ae533fa5815b2b9c508ebaa1ef3212cba385fd30013183b687a70b9426f859b\": failed to build tmp_conv/1--4ae533fa5815b2b9c508ebaa1ef3212cba385fd30013183b687a70b9426f859b: write tmp_conv/1--4ae533fa5815b2b9c508ebaa1ef3212cba385fd30013183b687a70b9426f859b/0003_sha256:e4c82a20fcbf9d3a45f13f59c4ded14a4ca5e7b160a12f412fef1fa44164919f/layer.tar: no space left on device"

2024-05-01T22:36:38.804 app[17816475ce5078] syd [info] time="2024-05-01T22:36:38.804111875Z" level=info msg="fdaa:0:7e:a7b:9076:0:a:800 - - [01/May/2024:22:31:41 +0000] \"POST /flyio/v1/buildOverlaybdImage HTTP/1.1\" 500 2346"

2024-05-01T22:36:39.203 app[17816475ce5078] syd [info] WARN Reaped child process with pid: 914 and signal: SIGPIPE, core dumped? false

And here’s the Dockerfile

FROM ollama/ollama:0.1.33-rc5

RUN ollama serve & sleep 5 && ollama pull llama3:8b-instruct-q8_0 && kill $!