Many docker images are able to start doing useful work before the entire image is downloaded. If you have a particularly large image (e.g. >1GB, or if you’re running ML workloads), this can have a real impact on how you decide to architect your application.
We’ve been working on a way to make new machine creation faster, using something called overlaybd. overlaybd is new container image format that allows us to delay downloading parts of the docker image until they are actually needed. When converting a docker image to this new format, each layer is turned into a block device that stores the changed blocks from the previous layer. overlaybd then merges the layers together, and can randomly access individual blocks in the remote container registry. It runs as a userspace daemon, exposing a virtual block device via TCMU, and only downloads blocks when they are read. We pass it along to firecracker as the rootfs for the machine.
To try out lazy-loaded images:
Make sure you are on flyctl >= v0.2.25
$ fly version
fly v0.2.25 darwin/arm64 Commit: 358c5fcbcbf3b9edfab38ba8f5e305c8b786231e BuildDate: 2024-03-26T04:41:20Z
Delete your remote builder app, as we have released a new version.
$ fly apps list | grep builder
fly-builder-bitter-snow-4886 personal suspended
$ fly apps destroy fly-builder-bitter-snow-4886
Enable lazy-loaded images in your fly.toml
[experimental]
lazy_load_images = true
Run fly deploy
Watch fly logs to see how long it takes to start the machine
This will probably work best for you if you have a large image (>1GB), and you care about machine creation time (e.g. if you maintain a warm pool of machines). The less amount of data you need to start up, the faster it should be.
This is an experimental, alpha-level feature. Once you try it out, we’d really like to know if it works for you, if you find any bugs, and how it performs with your docker images.
Actually, it probably does already. What’s making your image so large? Note that if you are running a GPU machine we give you a 50GB rootfs instead of 8GB, which is where the image size limitation comes from.
@ben-io I’m trying to use this option for deploying an ollama image with llama3 preloaded (around 9.5GB) but I keep getting the following error:
time="2024-05-01T06:05:06Z" level=error msg="failed to build overlaybd: failed to build \"sha256:05a88d426c377f7de36ce9e541c7b1edb6899b9070dab84b6d5da514fd61fc24\": failed to build tmp_conv/1--05a88d426c377f7de36ce9e541c7b1edb6899b9070dab84b6d5da514fd61fc24: write tmp_conv/1--05a88d426c377f7de36ce9e541c7b1edb6899b9070dab84b6d5da514fd61fc24/0003_sha256:a4f2f307d81738dbd6dfeedfc150a3cb2235ed3f04823a6641a5089f44cf439c/layer.tar: no space left on device"
Looks like you ran out of space on the builder. Try expanding the volume:
Find the builder app
❯ fly apps list | grep builder
fly-builder-broken-haze-2872 personal suspended
Find the builder volume
❯ fly vol list -a fly-builder-broken-haze-2872
ID STATE NAME SIZE REGION ZONE ENCRYPTED ATTACHED VM CREATED AT
vol_v8mo70jjey6dndlr created machine_data 5GB iad de98 true 78165d2b544968 24 minutes ago
Expand it
❯ fly vol extend -a fly-builder-broken-haze-2872 vol_v8mo70jjey6dndlr -s 30
@ben-io I tried extending the volume to 80GB but it still doesn’t work, and the fly dashboard says only 12GB of the volume is in use. I tried deleting the builder and running it again but still the same issue.
I tried running the deploy locally but it looks like it doesn’t support creating overlaybd images locally.
Here’s the last few log lines of my last attempt:
2024-05-01T22:36:37.706 app[17816475ce5078] syd [info] time="2024-05-01T22:36:37.706618135Z" level=info msg="fdaa:0:7e:a7b:9076:0:a:800 - - [01/May/2024:22:36:37 +0000] \"GET /flyio/v1/extendDeadline HTTP/1.1\" 202 0"
2024-05-01T22:36:38.348 app[17816475ce5078] syd [info] time="2024-05-01T22:36:38Z" level=error msg="failed to build overlaybd: failed to build \"sha256:4ae533fa5815b2b9c508ebaa1ef3212cba385fd30013183b687a70b9426f859b\": failed to build tmp_conv/1--4ae533fa5815b2b9c508ebaa1ef3212cba385fd30013183b687a70b9426f859b: write tmp_conv/1--4ae533fa5815b2b9c508ebaa1ef3212cba385fd30013183b687a70b9426f859b/0003_sha256:e4c82a20fcbf9d3a45f13f59c4ded14a4ca5e7b160a12f412fef1fa44164919f/layer.tar: no space left on device"
2024-05-01T22:36:38.804 app[17816475ce5078] syd [info] time="2024-05-01T22:36:38.804111875Z" level=info msg="fdaa:0:7e:a7b:9076:0:a:800 - - [01/May/2024:22:31:41 +0000] \"POST /flyio/v1/buildOverlaybdImage HTTP/1.1\" 500 2346"
2024-05-01T22:36:39.203 app[17816475ce5078] syd [info] WARN Reaped child process with pid: 914 and signal: SIGPIPE, core dumped? false
And here’s the Dockerfile
FROM ollama/ollama:0.1.33-rc5
RUN ollama serve & sleep 5 && ollama pull llama3:8b-instruct-q8_0 && kill $!