A new way to RootFS

We’re changing how we transmogrify your image into a rootfs for running Fly Machines. Over the next week, we’ll be rolling out the change to start using the new rootfs style when machines are created/updated and the behavior is largely the same (like the amount of ephemeral disk space available is the same). So much so, you may or may not even notice but we expect most customers to see the following benefits:

  • Better IO performance for workloads, both reads/writes (useful for loading models into memory) to/from the rootfs.
  • Resetting a machine state is faster. Right now the logic to reset the rootfs is typically fast but there’s still the possibility of a substantial delay before the next start can happen.
  • We’ve seen a bunch of users (understandably!) write things like sqlite to the rootfs (which you should definitely not be doing). The new style rootfs helps us make the behavior of the disks more predictable.

At this point, feel free to close the tab if you’re satisfied with what to expect.

So… what have we changed?

Currently, we take the image you configure for the machine and create a snapshot device to use as the rootfs and anytime the machine is stopped, we reset it by redoing the snapshot. This has worked well but we’ve always wanted to do better.

Moving forward, the rootfs will be constructed via overlayfs layers. The read-only layer is the same image snapshot device we create today but now there’s a new writable layer where changes are tracked. Now, when a machine is stopped, we no longer have to reset the rootfs because any writes that occurred while the machine was running are now backed by an ephemeral volume which is what we now reset and is a much faster and more predictable operation.

Where we would occasionally see issues is when the image needs to be re-pulled in order to create the snapshot which can be time consuming and will block the system from being able to start the machine back for those of you relying on our proxy to auto-start on requests.

19 Likes

One thing I failed to highlight in the original post was the implications for anyone creating swapfiles and running swapon directly. Originally, machines didn’t have first class support for swap and we had to recommend customers run certain commands as part of the initialization of the machine. Now that it is a first class field in both the fly.toml, it will be the only way to configure swap moving forward. Ideally we wouldn’t have to force this change as part of the new rootfs but unfortunately by using overlayfs the swapon command fails. If you do configured swap_size_mb, we allocate a device to be used for swap.

3 Likes

Hey there!

I run some VM’s that run Docker inside them (like rchab). It looks like these all default to the Docker VFS storage driver instead of overlay2. As a result, the disk fills up (all 8gb) when pulling in images (even if the images are like 200mb).

These are ephemeral so I don’t REALLY want to spin up a volume for them, but if I need to then I can.

df -h reports the Filesystem as None and lsblk is lying to me about something :stuck_out_tongue:

I think my solution here is to mount a disk when running Docker in a VM? Or would there be some tricks to get Docker using overlay2?

I often fly ssh console into machines to debug and then install tools (like less, or curl, or whatever) to help with the debugging. It’s been a nice feature knowing that I can restart the machine after I’m done and all the debugging tools go away. Will that still happen with the new rootfs?

3 Likes

I’m in the same boat as @fideloper
Previously we were spawning docker containers (like postgres) to run integration tests, which worked really well. Since this update it’s no longer possible as the disk gets full from running docker pull even though df -h report 13% of the available disk space is being used.
Is this the expected actual size of the machines now ? Is there a workaround that doesn’t involve managing volumes ?

Yes, we reset the write drive only now when a machine stops in order to maintain the same behavior as previously where we would reset the entire rootfs.

@fideloper and @Heb, you’ll need to attach a Volume to the machine and configure Docker to use the correct path for persistence.

That’s been working great for me, thanks!

1 Like

I don’t think @Heb was asking about persistence, more around the size limits of the writeable layer.

1 Like

Because it referenced running docker in the machine, it seemed the likely cause. The writable layer has the same capacity constraints as the original rootfs implementation.

Are you saying that the issue is related to our docker configuration, or that is a new limitation of the rootfs changes when running docker in the machine ?

Could you confirm if it’s possible or not to use most of the disk space reported by df -h
(This is what I’m getting when the machine starts :none 7.8G 6.5M 7.4G 1% /), and not just 13% when using docker pull ?

@Heb giving a quick test, using the Ubuntu docker image, I was able to create a machine, install docker on the machine using the convenience script, pull the postgres image, and start a postgres container, all without encountering any disk space errors.

If you’re able to post your setup scripts and configuration, or just a minimum viable reproduction, that would help in diagnosing your issue.

TL;DR we’ve deployed a change to our init such that it will attempt to support swap on overlayfs and if anything fails, the machine will continue to function without swap. We’ll be resuming the rollout next week.

It’s been a minute since we’ve provided an update because we felt like enough customers were experiencing unexpected behavior when getting switched to the new rootfs. So we spent some time discussing potential options, one option was to write a custom swapon binary. With it, we could detect a swapon call, log a warning suggesting to use the supported way, and continue on without failure. Once we started down this route, we determined it could also be possible to configure access to the writeable layer for use by the created swapfile. The end result looks like the following:

  • determine if using overlayfs and create a new upper layer which then gets bind mounted
  • copy our custom swapon binary into the chroot
  • determine the PATH for the original swapon binary, rename it, and hard link our custom swapon to the original location
  • when our custom swapon is executed, it verifies the path attempting to be used is not on a volume, replaces the swapfile path sent to the original/real swapon binary with the path to the writable layer
  • write a warning to the console so customers both know what’s happening and encourage to use the Fly-support swap configuration

With the above in place, we feel confident continuing to migrate workloads to the new rootfs.

2 Likes