Fly Volumes provide persistent storage to your app by exposing a slice of a local NVMe drive. This works well in many cases, but it requires a “provision-first” approach: you need to know roughly what you’re going to use in advance (yes, you can manually extend your disks, even without restarting your machine, but you cannot yet shrink them back). Also, there’s only so much space available to a single volume on a single server: right now, the cap is 500 GiB. If you have terabyte upon terabyte of data to store, especially if it’s not frequently accessed, then you’ll need to look for alternatives.
One option might be an object storage service like S3. They can store a virtually limitless amount of data, they have great durability, and they’re cost-effective. There’s a good chance that you’re already using one. And if your app already supports object storage directly, great!
But what if your app doesn’t — like, say, your Postgres database? Or perhaps it would just be easier to store your data as files on a disk, but with the capacity and durability a you can get from object storage.
With this in mind, we wanted to know whether disk volumes backed by object storage were possible. So we read up on it and built an experimental version based on recent research into log-structured virtual disks (LSVD). (That’s why you’ll see a lot of lsvd
commands below — we’re not committed to the name though .) As a proof-of-concept, we even got a 100 GiB (and growing) Postgres database running on it—check it out here.
Now it’s your turn to try it out—we want to know what you think!
NB: Just a quick reminder before you proceed—this is an experimental feature. It’s not ready for production use, and it’s not officially supported. If you run into problems, please get in touch with us here on the community forum.
How can I try it?
Before starting, make sure that you have the latest flyctl
version (you’ll need features introduced in v0.1.103).
Create a bucket
First, you’ll need to create a bucket on S3 or a compatible object storage service. You’ll also need credentials that can read, write, and list objects in that bucket. You can do this however you’d like, but for S3 itself, you can check out the open-source s3-credentials
tool. Specifically, you can use the --create-bucket
and --bucket-region
flags with s3-credentials create
to get a new bucket created along with credentials for a new AWS user that has access to it.
NB: Create your new bucket in a region close to the Fly.io region from which you plan to use it. This is important—it will keep the latency of I/O operations down! For S3, this live RTT table can help you choose.
Set up your Fly app
Since this is an experimental feature, we strongly recommend creating a fresh app with fly apps create
to test this, rather than using an existing one.
Once you have your bucket and credentials, you can run
fly volumes lsvd setup -a <your-app-name>
It’ll prompt you for the relevant information and set secrets on your app to configure it.
Create an LSVD-enabled Fly Machine
Use the new --lsvd
flag with fly machines run
to create a new Machine in your app. This will inject and start a background daemon (called lsvd
) in your Machine that actually provides the volume. It will be mounted at the path you specified in the previous (setup) step. (The raw volume will be exposed as /dev/nbd0
for those of you who want to get real fancy/crazy with the raw block device.)
Make sure to give your Machine enough memory (--vm-memory
), because the lsvd
daemon will need some of it. 512 MiB is a good baseline. Larger disks will require more memory: the lsvd
process currently needs 2 MiB of memory per GiB of disk. (We realize that this is a lot of overhead; we hope to reduce it in the future!)
NB: Before deploying your machine, you’ll need the CA certificates available in your Docker image, so that the
lsvd
daemon can connect to object storage over HTTPS. They’re usually easy to add. Here’s an exampleDockerfile
line for Debian-based images (withapt-get
):
RUN apt-get -y update \
&& apt-get -y install ca-certificates \
&& apt-get -y clean \
&& rm -rf /var/lib/apt/lists/*
In summary, this is the full command to get you started with a local Dockerfile
:
fly machines run -a <app-name> --lsvd --vm-memory 512 .
NB: For now, don’t run more than one LSVD-enabled machine per app! They’ll conflict with each other and corrupt your volume.
Nerdy stuff below , for all of you nerds
Diving deeper: what are all these objects showing up in S3?
Once you’ve run your shiny new S3-backed volume for a while, you’ll notice objects with funny hexadecimal names appearing in your bucket. Each object contains logs of the writes made to the disk. They’re numbered sequentially.
Each log entry records both what part of the disk was written and the actual data that was written. To read back a sector of the disk, you can scan the logs to find the most recent write to that sector and pull the data from that entry.
This basic idea can ultimately be optimized enough to make it practical (for more on this, check out the paper!), and it has a really nice quality: snapshotting is “built-in” to the design. To restore the volume to a given point, all you need to do is ignore all the log entries that come after that point. You can try this yourself:
- Write a bunch of data to the disk. Shut the machine down and write down the name of the most recent log object.
- Start the machine again, write even more data, and then shut it down again.
- Delete all the log objects that come after one you found in step (1).
- Start the machine one last time. Observe that the disk state rolled back to how it was at the end of step (1).
But what’s the performance like?
There’s no getting around the fact that accessing data from S3 (or a compatible service) will have a much higher latency (tens to hundreds of milliseconds) than a locally attached SSD (tens to hundreds of microseconds). However, like an SSD (and unlike a hard drive), S3 can handle many requests simultaneously. Accordingly, we’ll use up to 100 connections at once to access the backend. We also tune the Linux kernel in your Machine to read ahead aggressively during sequential reads to provide reasonable throughput even with high latency. From our testing, up to a few thousand IOPS and transfer rates on the order of 10 to 100 MiB/s are achievable.
The tradeoffs won’t make sense for every app! However, if your application issues many I/O operations concurrently, deals with a large amount of cold data, and can work with the latency, we think that this is an option worth exploring.
NB: For improving performance, here are some additional things to consider:
- If you’re running a database, then it may be possible to tune it. E.g., Postgres users might try setting
effective_io_concurrency
to 100 to match the number of concurrent I/O operations available.- If you’re feeling intrepid, then you can experiment with locally caching hot data via the Linux kernel’s dm-cache feature, which is built into our guest kernel. If interested, let us know below and we can share some additional information about this!
Acknowledgments
This feature is based on research presented in the LSVD paper and earlier work on block devices in userspace. You can find software published with the papers on GitHub: asch/dis, asch/buse, and asch/bs3.
Furthermore, we built the lsvd
program that gets added to your VM using a number of open-source libraries—you can see the full list by running /.fly/lsvd licenses
in your LSVD-enabled Machine.