LiteFS multiple databases

Is there a way to only sync an individual database to a fly machine from a cluster?

I have a usecase where one app needs access to a single database but I have a management app that needs access to many databases (potentially 10s or 100s)

Would each of the databases get synced to each machine even if I only open one?

I think there are 2 concerns here.

  1. Sharing data across app. Can the LiteFS db sync across app boundaries.
  2. How does LiteFS handle multiple databases on Fly.io.

Item 1 - Sharing Data Across Apps

I know you can have multiple apps share the same database (backend / frontend).

I have a use case where I have a BaaS app that is the Primary.

The UI is then separate apps that can scale horizontally, and are read only.

In this case I share the LiteFS Cloud key, and the Consule key. I make sure the Consule setup uses the same app name for both.

Item 2 - Handling Multiple Databases

I asked a similar question here, and the answer made it sound like the proxy only knows about one database.

That being said, I have a static lease setup locally, and using NGINX where I control where POST requests go. The way I understand it, is that Fly.io proxy only knows of one database when using Consul.

With a static lease, I am not sure you have the option to use the fly proxy, as you are defining the endpoint as static. But your apps need to know “where” to write. This goes for all databases. This model can have downtime, manual intervention, and won’t handle scaling horizontally.

BUT -

The LiteFS example repo uses NGINX to route all non-get requests to the right place. I think you could use this on fly.io to handle something similar, but making the proxy handle multiple databases for you.

Hope that helps, really curious the outcome with your work here.

One final thought - When I go to interact with LiteFS Cloud, I do have to select a database (not just the cloud instance), which makes the UI feel like it supports multiple. It’s possible that it would backup all DBs, but not 100% clear on this, and have not found a direct question/answer around the cloud specifically.

Though it sounds like they may cap at 10GBs for the Cloud instance backup. Not sure if your 100 DBs go beyond that. But you can always just backup to S3 or B2 or some other S3 compatible storage pretty easy with Litestream (also made by @benbjohnson)

Thanks for the very thorough information, that gives a lot of food for thought. I think i understand about the proxy stuff and the limitations, I could work around those in the app that has to connect to more than 1 database.

My use case is having clients that are 99.9% read with occasional writes to their own database but potentially a few instances, and I think using the proxy, or doing what the proxy does manually in the app will be fine.

But the management app will update records in many databases, and potentailly search across them.

I was more thinking about the actual fuse mount and sync behaviour.

When you first stand up the litefs you either create a database in the mount location with your app and that then gets replicated. But you could also import more than one database into the share as far as I can tell using litefs import. You then have one cluster with 2 databases.

If I join a second instance to the cluster, will both databases get synced to the local machine before litefs runs the binary specified in the exec? or does the sync only happen when you open the database?

If its the latter as I add more and more databases to the share it will take longer and loger for new instances to spin up as there is more data to sync to the machine. Does that make sense?

1 Like

I am unclear if the sync happens before app starts. I think if you use exec.cmd and have LiteFS be the supervisor, it may do that. It does seem like the apps trying to connect before launching.

For my local setup, I only have one FUSE directory that everything sits in, same with LiteFS location.

And to my surprise, both databases are synced. I think the limitation is only the proxy.

The WAL and SHM files are named the same as the DB, so they don’t conflict. Works really well =).

If I understood your questions correctly, I think your fine.

That being said, I am not sure you can prevent the sync from sharing all DBs across all instances. I think that would be up to the individual app, that it can only read/write from that specific DB. And that might bloat your volumes…

The sync is for all databases in a cluster. We don’t currently have a way to only replica individual databases but it’s a feature that’s been requested several times. Are you wanting to just specify a different set of databases on each replica? Do you need any kind of auth to restrict which databases each replica has access to?

If I could specify a list in the config which databases to sync to the machine then I think I could do without auth, as the local app wouldnt be able to see the database.

Then on the managment interface I could just specify all the databases I wanted it to manage.

Do you think that is possible in the future?

As a work around, could I run several litefs fuse mounts on the management app machine?

@deano-fury It was a pretty straightforward implementation so I went ahead and put up a PR: Filter replicated databases by benbjohnson · Pull Request #407 · superfly/litefs · GitHub

You can give a try if you specify the PR build in your Dockerfile:

COPY --from=flyio/litefs:pr-407 /usr/local/bin/litefs /usr/local/bin/litefs

The config usage is in the PR description. Let me know if something like that would work for your use case.

Very cool @benbjohnson

Out of curiosity, why can’t we give a array of DBs to the proxy in the same way so we can have horizontal scaling with multiple DBs and DR?

proxy:
  addr: ":8080"
  target: ":8055"
  db: 
    -  "data.db"
    -  "logs.db"
  passthrough:
    - "*.ico"
    - "*.png"

Is it became it requires work on the Fly proxy to support, and is not just specific to LiteFS?

Worked like a charm. Thats exactly what I wanted.

A couple more questions, can a litefs cluster span apps? I want to run an instance per customer so I can have customers at different versions of my application. And I want a management app that is just accessible to me that has all the databases mounted into.

Also Is there a way to configure the backup client to run on a machine that isnt on a fly machine?
I gave managed to join the cluster on my local machine and become the master. I can then create a database locally and import it,

Would I need to have the backup client running from here if im the primary for any length of time?

Thanks

Dean

:tada::tada::tada:

Yes, as long as your apps are in the same organization they can talk to one another.

The backup client doesn’t need to run on Fly. It just needs to have LITEFS_CLOUD_TOKEN set as an environment variable and it needs to be primary. That should be the only two conditions.

If you have your cluster configured to back up to LiteFS Cloud then the cloud will be the data authority. So if you switch to your local machine to be primary, you can update the database, however, when you switch back to your other machine that’s connected to LiteFS Cloud, it won’t see the updates and it’ll revert state.

tl;dr yes you’ll need to configure any node that can become primary to use LiteFS Cloud if you have it enabled.

We can add the ability to track consistency across multiple databases. We’d just need to update the cookie set by the proxy to list multiple databases’ TXID.

Where do you get the backup client from?

It just magically happened on fly.

Its ok,
I had exported the variable incorrectly its all sorted now. Thank you very much.

I think that would be helpful. I played around with multiple BaaS solutions that sit ontop of Sqlite, and many of them use more than required more than 1 database, or support multiple databases optionally with Sqlite.

Evaluated -

Pocketbase
NocoDB
Directus
AirByte (sync, with airtable)
Flow (sync, with airtable)
Strapi
Prisma Studio
Drizzle Studio

And probably some others. =)
It was a exhaustive 2 weeks, and I ended up using my first choice.

@benbjohnson The filtering has stopped working.

I get the kt.db in the fuse mount on the app box, and the tlav.db on the management node.

In the /var/lib/litefs/dbs folder I get both showing up.

Any ideas?

I have this config on the the management node (currently a replica)

fuse:
  dir: "/db"

data:
  dir: "/var/lib/litefs"

exit-on-error: false
lease:
  type: "consul"
  advertise-url: "http://${HOSTNAME}.vm.${FLY_APP_NAME}.internal:20202"
  candidate: ${FLY_REGION == PRIMARY_REGION}
  promote: true

consul:
    url: "${FLY_CONSUL_URL}"
    key: "litefs/${FLY_APP_NAME}"

and this config on the app node

fuse:
  dir: "/db"

data:
  dir: "/var/lib/litefs"

exit-on-error: false

exec:
  - cmd: "kt"

lease:
  type: "consul"
  advertise-url: "http://${HOSTNAME}.vm.${FLY_APP_NAME}.internal:20202"
  candidate: false
  databases: ["kt.db", "tlav.db"]

  consul:
    url: "${FLY_CONSUL_URL}"
    key: "litefs/${FLY_APP_NAME}"

@benbjohnson

I have fixed it, I started again. I think it had something to do with my local machine not being connected to the backup for a certain period of time. I got some strange errors like the one below once I had connected it to the backup.

http: POST /stream: error: stream error: db=\"kt.db\" err=stream ltx (0000000000000001): write ltx snapshot to chunked stream: canceled, http server closed"

I deleted the machines, and volumes, created a new cluster, changed the consul key and re-imported the databases and it all now seems to work as it should. I havent tried to connect my machine to it. I will save that for another day.

If I get any more issues I will let you know.

Thanks

Dean

1 Like

I have some doubts about how it will work. @benbjohnson, could you go a bit deep on it?

PS: I’m considering two apps using the same LiteFS cluster with a proxy. Only one of the apps will be a primary, so I guess it will mess up everything. :thinking:

I have this working. All you need is the share the same consul key. In my case my case I used my FrontEnd app as the shared key. But I have that app setup to not be promoted. Ever. It is a read only instance.

The Backend App, is setup to be the primary and can be promoted.

FrontEnd App - litefs.yml

lease:
  type: "consul"
  advertise-url: "http://${FLY_ALLOC_ID}.vm.${FLY_APP_NAME}.internal:20202"
  candidate: false
  promote: false

  consul:
    url: "${FLY_CONSUL_URL}"
    key: "litefs/${FLY_APP_NAME}"

Backend App - fly.toml

I set this to match the FLY_APP_NAME from the first app, what I consider my only a reader app.

[env]
  SHARED_APP_NAME = "PRIMARY_APP_NAME"

Backend App - litefs.yml

lease:
  type: "consul"
  advertise-url: "http://${FLY_ALLOC_ID}.vm.${FLY_APP_NAME}.internal:20202"
  candidate: ${FLY_REGION == PRIMARY_REGION}
  promote: true

  consul:
    url: "${FLY_CONSUL_URL}"
    key: "litefs/${SHARED_APP_NAME}"

Thank you to @deano-fury for reminding me.

Add the secret called FLY_CONSUL_URL, that has the URL from app you reference in SHARED_APP_NAME.

If you don’t have this handy, you can use -

Use the fly ssh console cli command in your primary app to open a console to fly for one of the machines that is deployed. You can then echo $FLY_CONSUL_URL to get that secret.

You can then use fly secrets set FLY_CONSUL_URL=<URL VALUE> cli command in your secondary apps to setup that in your other applications.

You will need to redeploy, as it will have staged this secret. So deploying will make it active.

2 Likes

I did exactly that.

I removed the consul subscription from the view only app

and added a secret called FLY_CONSUL_URL with the URL from the master app.

Works like a charm

2 Likes

Yep, I did that also, good call. Will add that to my answer above.