Fly Volumes provide persistent storage to your app by exposing a slice of a local NVMe drive. This works well in many cases, but it requires a “provision-first” approach: you need to know roughly what you’re going to use in advance (yes, you can manually extend your disks, even without restarting your machine, but you cannot yet shrink them back). Also, there’s only so much space available to a single volume on a single server: right now, the cap is 500 GiB. If you have terabyte upon terabyte of data to store, especially if it’s not frequently accessed, then you’ll need to look for alternatives.
One option might be an object storage service like S3. They can store a virtually limitless amount of data, they have great durability, and they’re cost-effective. There’s a good chance that you’re already using one. And if your app already supports object storage directly, great!
But what if your app doesn’t — like, say, your Postgres database? Or perhaps it would just be easier to store your data as files on a disk, but with the capacity and durability a you can get from object storage.
With this in mind, we wanted to know whether disk volumes backed by object storage were possible. So we read up on it and built an experimental version based on recent research into log-structured virtual disks (LSVD). (That’s why you’ll see a lot of
lsvd commands below — we’re not committed to the name though .) As a proof-of-concept, we even got a 100 GiB (and growing) Postgres database running on it—check it out here.
Now it’s your turn to try it out—we want to know what you think!
NB: Just a quick reminder before you proceed—this is an experimental feature. It’s not ready for production use, and it’s not officially supported. If you run into problems, please get in touch with us here on the community forum.
Before starting, make sure that you have the latest
flyctl version (you’ll need features introduced in v0.1.103).
First, you’ll need to create a bucket on S3 or a compatible object storage service. You’ll also need credentials that can read, write, and list objects in that bucket. You can do this however you’d like, but for S3 itself, you can check out the open-source
s3-credentials tool. Specifically, you can use the
--bucket-region flags with
s3-credentials create to get a new bucket created along with credentials for a new AWS user that has access to it.
NB: Create your new bucket in a region close to the Fly.io region from which you plan to use it. This is important—it will keep the latency of I/O operations down! For S3, this live RTT table can help you choose.
Since this is an experimental feature, we strongly recommend creating a fresh app with
fly apps create to test this, rather than using an existing one.
Once you have your bucket and credentials, you can run
fly volumes lsvd setup -a <your-app-name>
It’ll prompt you for the relevant information and set secrets on your app to configure it.
Use the new
--lsvd flag with
fly machines run to create a new Machine in your app. This will inject and start a background daemon (called
lsvd) in your Machine that actually provides the volume. It will be mounted at the path you specified in the previous (setup) step. (The raw volume will be exposed as
/dev/nbd0 for those of you who want to get real fancy/crazy with the raw block device.)
Make sure to give your Machine enough memory (
--vm-memory), because the
lsvd daemon will need some of it. 512 MiB is a good baseline. Larger disks will require more memory: the
lsvd process currently needs 2 MiB of memory per GiB of disk. (We realize that this is a lot of overhead; we hope to reduce it in the future!)
NB: Before deploying your machine, you’ll need the CA certificates available in your Docker image, so that the
lsvddaemon can connect to object storage over HTTPS. They’re usually easy to add. Here’s an example
Dockerfileline for Debian-based images (with
RUN apt-get -y update \
&& apt-get -y install ca-certificates \
&& apt-get -y clean \
&& rm -rf /var/lib/apt/lists/*
In summary, this is the full command to get you started with a local
fly machines run -a <app-name> --lsvd --vm-memory 512 .
NB: For now, don’t run more than one LSVD-enabled machine per app! They’ll conflict with each other and corrupt your volume.
Once you’ve run your shiny new S3-backed volume for a while, you’ll notice objects with funny hexadecimal names appearing in your bucket. Each object contains logs of the writes made to the disk. They’re numbered sequentially.
Each log entry records both what part of the disk was written and the actual data that was written. To read back a sector of the disk, you can scan the logs to find the most recent write to that sector and pull the data from that entry.
This basic idea can ultimately be optimized enough to make it practical (for more on this, check out the paper!), and it has a really nice quality: snapshotting is “built-in” to the design. To restore the volume to a given point, all you need to do is ignore all the log entries that come after that point. You can try this yourself:
- Write a bunch of data to the disk. Shut the machine down and write down the name of the most recent log object.
- Start the machine again, write even more data, and then shut it down again.
- Delete all the log objects that come after one you found in step (1).
- Start the machine one last time. Observe that the disk state rolled back to how it was at the end of step (1).
There’s no getting around the fact that accessing data from S3 (or a compatible service) will have a much higher latency (tens to hundreds of milliseconds) than a locally attached SSD (tens to hundreds of microseconds). However, like an SSD (and unlike a hard drive), S3 can handle many requests simultaneously. Accordingly, we’ll use up to 100 connections at once to access the backend. We also tune the Linux kernel in your Machine to read ahead aggressively during sequential reads to provide reasonable throughput even with high latency. From our testing, up to a few thousand IOPS and transfer rates on the order of 10 to 100 MiB/s are achievable.
The tradeoffs won’t make sense for every app! However, if your application issues many I/O operations concurrently, deals with a large amount of cold data, and can work with the latency, we think that this is an option worth exploring.
NB: For improving performance, here are some additional things to consider:
- If you’re running a database, then it may be possible to tune it. E.g., Postgres users might try setting
effective_io_concurrencyto 100 to match the number of concurrent I/O operations available.
- If you’re feeling intrepid, then you can experiment with locally caching hot data via the Linux kernel’s dm-cache feature, which is built into our guest kernel. If interested, let us know below and we can share some additional information about this!
This feature is based on research presented in the LSVD paper and earlier work on block devices in userspace. You can find software published with the papers on GitHub: asch/dis, asch/buse, and asch/bs3.
Furthermore, we built the
lsvd program that gets added to your VM using a number of open-source libraries—you can see the full list by running
/.fly/lsvd licenses in your LSVD-enabled Machine.