Playing with LiteFS, not sure if I understand how it works

I deployed a small app to test LiteFS functionality. I’ll share the dockerfile, yaml, and toml files in my next comment.

The deployment was successful. I was able to scale the application by adding another machine in the same region and cloning the main machine to a different region - both worked properly.

I then began turning machines on and off to understand how LiteFS operates. My understanding is that LiteFS acts as a WAL journal mode for SQLite distributed across machines, where only the primary machine can perform writes. Communication between machines happens through fly-replay under the hood.

First, I tested the primary machine by turning it on and off to verify the volume was working correctly - it was.

Next, I scaled the application by cloning to another region using fly -m clone ... and added another machine to the same region with fly scale count 2. Then I started testing:

  1. With only the primary machine running: I could perform both reads and writes without issues
  2. With the primary machine turned off: I could still perform read operations without problems
  3. After turning the primary machine back on: I could no longer write to the database and got the error attempt to write a readonly database

I then tried deleting the primary machine to see if other machines (like the cloned one) would assume the primary role - they did not

This means that my primary machine needs to run 24/7, and once it’s turned off, database writes become impossible even after restarting it?

1 Like

The tech stack :

  • elysia.js app
  • bun
// Index.ts
import { Elysia } from "elysia";
import Database from "bun:sqlite";
import { logger } from "@bogeychan/elysia-logger";
import swagger from "@elysiajs/swagger";

// Initialize SQLite database (LiteFS will handle this file)
const db = new Database("/litefs/app.db");

// Create a simple table for demonstration
db.exec(`
  CREATE TABLE IF NOT EXISTS messages (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    content TEXT NOT NULL,
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP
  )
`);

const app = new Elysia()
  .use(logger())
  .use(swagger())
  .get("/", () => "Hello Elysia with LiteFS!")
  .get("/messages", () => {
    const messages = db.query("SELECT * FROM messages ORDER BY created_at DESC").all();
    return { messages };
  })
  .post("/messages", ({ body }: { body: { content: string } }) => {
    const stmt = db.prepare("INSERT INTO messages (content) VALUES (?)");
    const result = stmt.run(body.content);
    return { 
      success: true, 
      id: result.lastInsertRowid,
      message: "Message created successfully" 
    };
  })
  .get("/health", () => {
    // Health check endpoint that also verifies database connectivity
    try {
      const count = db.query("SELECT COUNT(*) as count FROM messages").get() as { count: number };
      return { 
        status: "healthy", 
        database: "connected",
        message_count: count.count 
      };
    } catch (error) {
      return { 
        status: "unhealthy", 
        database: "disconnected",
        error: error instanceof Error ? error.message : "Unknown error"
      };
    }
  })
  .listen(3000);

console.log(
  `🦊 Elysia is running at ${app.server?.hostname}:${app.server?.port}`
);

// Graceful shutdown
process.on("SIGINT", () => {
  console.log("Shutting down gracefully...");
  db.close();
  process.exit(0);
});

process.on("SIGTERM", () => {
  console.log("Shutting down gracefully...");
  db.close();
  process.exit(0);
});

# syntax = docker/dockerfile:1

# Adjust BUN_VERSION as desired
ARG BUN_VERSION=1.2.5
FROM oven/bun:${BUN_VERSION}-slim as base

LABEL fly_launch_runtime="Bun"

# Bun app lives here
WORKDIR /app

# Set production environment
ENV NODE_ENV="production"

# Install LiteFS dependencies
RUN apt-get update -y && apt-get install -y ca-certificates fuse3 sqlite3
# Throw-away build stage to reduce size of final image
FROM base as build

# Install packages needed to build node modules
RUN apt-get update -qq && \
    apt-get install --no-install-recommends -y build-essential pkg-config python-is-python3

# Install node modules
COPY --link bun.lock package.json ./
RUN bun install --ci

# Copy application code
COPY --link . .

# Final stage for app image
FROM base

# Install LiteFS dependencies in final stage too
RUN apt-get update -y && apt-get install -y ca-certificates fuse3 sqlite3

# Copy LiteFS binary
COPY --from=flyio/litefs:0.5 /usr/local/bin/litefs /usr/local/bin/litefs

# Copy built application
COPY --from=build /app /app

# Copy LiteFS configuration
COPY litefs.yml /etc/litefs.yml

# Expose ports: 3000 for app, 8080 for LiteFS proxy
EXPOSE 3000 8080

# Use LiteFS as entrypoint
ENTRYPOINT ["litefs", "mount"]
# YAML FILE
fuse:
  dir: "/litefs"

data:
  dir: "/var/lib/litefs"

exit-on-error: false

proxy:
  addr: ":8080"
  target: "localhost:3000"
  db: "db"
  passthrough: 
    - "*.ico"
    - "*.png"

exec:
  - cmd: "bun run src/index.ts"


lease:
  type: "consul"
  advertise-url: "http://${HOSTNAME}.vm.${FLY_APP_NAME}.internal:20202"
  candidate: ${FLY_REGION == PRIMARY_REGION}
  promote: true

  consul:
    url: "${FLY_CONSUL_URL}"
    key: "litefs/${FLY_APP_NAME}"
# fly.toml app configuration file generated for test-back-server on 2025-06-20T15:08:22-06:00
#
# See https://fly.io/docs/reference/configuration/ for information about how to use this file.
#

app = 'test-back-server'
primary_region = 'lax'

[build]

[[mounts]]
  source = 'litefs'
  destination = '/var/lib/litefs'

[http_service]
  internal_port = 3000
  force_https = true
  auto_stop_machines = 'stop'
  auto_start_machines = true
  min_machines_running = 0
  processes = ['app']

[[vm]]
  memory = '1gb'
  cpu_kind = 'shared'
  cpus = 1

fly commands in order of exection

fly launch --no-deploy
fly consul attach
fly volumes create litefs --size 1 
fly deploy

# then scaling
fly scale count 2
fly m clone --select --region sjc 

# Note: After deployment it was possible to clone the original first machine, 
# every time I tried to "scale" to another region or clone other than the
# first machine got the error :
# Error: failed to launch VM: insufficient resources to create new machine with existing volume

Hi… LiteFS really is fun to play with, :dolphin:, but there are some details that the official documentation either doesn’t emphasize or leaves entirely to the reader to infer from their existing distributed systems knowledge…

Starting with the easier one, the error you were seeing is probably because you have internal_port configured incorrectly: traffic was bypassing the Fly-Replay mechanism completely. (Since the LiteFS proxy was out of the loop.)

The following older post has an explicit table showing how things are supposed to match:

https://community.fly.io/t/setting-up-litefs-with-the-proxy-and-docker/20927/2

This is actually mainly through the .internal network (a.k.a. 6PN)—but optionally also occasionally via Fly-Replay, when redirecting incoming POSTs to the primary, etc.

You do want something like that, but for different reasons. One of the things that was left to deduction is that you should always have two primary-candidates running at (almost) all times. (I.e., min_machines_running = 2.) This makes sense if you consider what happens if the existing primary fails but the replacement candidate has been asleep for the past 3 months…


As a final tip, always look at the logs (fly logs) and event stream when doing experiments with LiteFS handovers; those really tend to clear the mists of who currently has what baton, etc.

Hope this helps!

Okay so two main issues I ran into. First, the fly launch command keeps reverting internal_port back to 3000, which is annoying. You have to manually fix it to 8080 in the fly.toml file before you deploy.

Second mistake was consule lease candidates. I had it set to only use lax region, but you can just set candidate to true and let Consul pick the region automatically. If the machines on primary region fail, now can choose a machine for other region as primary - no more error: attempt to write a readonly database.

Here’s what changed in the config:

# Before - only lax region could be primary region:
lease:
  type: "consul"
  advertise-url: "http://${HOSTNAME}.vm.${FLY_APP_NAME}.internal:20202"
  candidate: ${FLY_REGION == PRIMARY_REGION}
  promote: true

# After - any region can be primary:
lease:
  type: "consul"
  advertise-url: "http://${HOSTNAME}.vm.${FLY_APP_NAME}.internal:20202"
  candidate: true
  promote: true

Fly steps:


fly launch --no-deploy
fly consul attach
fly volumes create litefs --size 1 -r lax

# Important - fly launch resets internal_port to 3000, 
# so you gotta manually change it back to 8080 in fly.toml 
# under [http_service] before the next step.

fly deploy --config [you_app].toml

And that’s it. LiteFS working now.

if you have any suggestions on how to improve the dockerfile let me know but the one on the example works fine

PD: this is the most up to date tutorial on how to start with litefs lol

Ah, I think flyctl launch is only intended to be used once, to get you started. Thereafter, you only need flyctl deploy.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.