That’s great feedback. Thank you!
I’ll get some docs updated with that info but I’ll try to give a quick explanation to your questions here. Let me know if my answers spur more questions and we can update the docs to reflect those too.
Our recommendation looks like this:
-
2 candidate nodes in your primary region. This lets you handoff “primary” status between these nodes during a deploy or if one goes down. It also keeps your primary in the same physical location so your write latency shouldn’t change just because your primary does.
-
Use volumes with each of your instances. I’d recommend having volumes at least 2x the size of your database since LiteFS stores transaction data temporarily so it can replicate it. Volumes also make it so your instance is ready faster on startup and doesn’t need to copy a new snapshot from your primary node.
-
Add nodes in additional regions to reduce latency for read requests. You probably don’t need instances in every region. For example, if you only had instances in the US and you added a replica in Asia then you’d automatically cut your round trip time for read requests for Asia users by ~250ms.
Right now, LiteFS is optimized for read-heavy workloads so if you have a lot of writes then it might be worth benchmarking that. Most of that overhead for writes comes from FUSE. We’ll be releasing a SQLite VFS implementation in the near future which will avoid that FUSE overhead entirely.
LiteFS uses asynchronous replication too so there’s a small window where SQLite can confirm a write and then you suddenly lose that primary before it’s able to replicate that change to the other candidate. Replication is really fast so that window is typically only a few milliseconds (or less) with two candidates in the same region.
Consul provides a distributed lease for LiteFS. That’s what ensures that at any given time there’s only one primary node. It also allows us to change the primary node safely and automatically when the current primary goes down.
The alternative is to use a "static"
lease type in LiteFS. This is where you only have a single instance that is ever the primary. In this case, if it goes down then you’ll lose write availability until you reconfigure your cluster to point to a new primary node.