LiteVFS - SQLite VFS extension for on-demand paging from LiteFS Cloud

There is a new addition to LiteFS Cloud we’ve been working on - LiteVFS. LiteVFS is a dynamically loadable extension for SQLite that provides a SQLite Virtual File System (VFS) for on-demand paging from LiteFS Cloud.

https://github.com/superfly/litevfs

Intro

LiteFS is great, but it may not be suitable for dynamic or ephemeral environments, like Machines with auto-start/auto-stop, Machines without volumes or AWS Lambda. LiteVFS is designed specifically with such use cases in mind, uses LiteFS Cloud for persistence and locally available space for cache.

A database opened via LiteVFS will be fetched from LiteFS Cloud and cached locally. Subsequent reads that need the same data pages are served from local cache and don’t need any network access.

Only the pages requested by SQLite (with some additional prefetching) are fetched from LiteFS Cloud, so, if the query needs only a small subset of data, this should work well even on large databases.

LiteVFS periodically queries LiteFS Cloud to maintain the same database state. Pages changed on LiteFS Cloud side are dropped from the local cache and refetched when needed. On average, you can expect ~1 second delay between LiteFS Cloud state and local LiteVFS state.

How to use it

The extension can be loaded from any language with SQLite bindings or from SQLite shell:

$ export LITEFS_CLOUD_TOKEN="<your token here>"
$ sqlite
sqlite> .load target/release/liblitevfs.so
sqlite> .open file:demo.db?vfs=litevfs
sqlite> .tables
data
sqlite>

I’ve made a couple of demo apps to show how to use it from Go and Node: https://github.com/fly-apps/litevfs-demo

Writing via LiteVFS

Each LiteVFS node can write data, but it needs to hold a write lease. Writing data via LiteVFS results in several network requests to LiteFS Cloud: acquiring a lease, sending changes, releasing the lease. A lease can be acquired by issuing a litevfs_acquire_lease pragma statement:

sqlite> pragma litevfs_acquire_lease;
sqlite> insert into data (data) values (123);
sqlite> pragma litevfs_release_lease;

Using together with LiteFS

LiteVFS can be used to read data from LiteFS Cluster backed up to LiteFS Cloud. At the moment, it’s not safe to write data to the same clusters via LiteVFS, but it can a good way to quickly bring up additional read-only replicas on machines without volumes.

Limitations

For now, databases in WAL journal mode can be opened only in read-only mode. The same applies to the databases with auto_vacuum enabled.
Manually issued VACUUM statements may also fail if they end up reducing database file size.

6 Likes

I am trying to understand LiteFS better.

I must admit, from what I had read I’d thought it was already suitable for those use cases. I am looking for a solution to have a database-backed app that can “scale to zero” when not in use.

Have I understood correctly that LiteFS supports that case, e.g. you could have:

  • a fly machine configured to auto start and stop
  • a LiteFS db mounted to a volume

…?

And so this new LiteVFS is specifically for the case where you are unable to mount a volume, such as AWS Lambda as mentioned.

If I’m on track so far then my question is to know more about the issue with “Machines with auto-start/auto-stop” and LiteFS - is it completely unworkable, or it’s more just there’s an edge case around data integrity? (Presumably the auto-stop only occurs when the app is not handling a request so would not expect frequent problems in that regard?)

3 Likes

Hi @anentropic

Yeah, your understanding of LiteFS/LiteVFS is correct.

LiteFS with auto-stop/auto-start should work, but you need to be careful which machines can become primaries (candidate=true). If your configuration allows for more than one primary candidate, one them may be auto-stopped, lag behind significantly and later be promoted to a primary.

You can keep the primary machine always running to prevent this. Alternatively, you can configure LiteFS Cloud backup. It acts as the data authority so newly started machine can always pull the latest data from it.

Take a look at this post where something similar has been discussed - Understanding litefs for "rarely up" architecture