(SOS) LiteFS + Fly + Remix configuration

michaelg · October 12, 2023, 11:15pm

I am trying to deploy a Remix app to Fly with two machines and one prisma SQLite database replicated across the machines using LiteFS. Each machine has its own volume and are in different regions.

I am a complete novice when it comes to DevOps and I’ve been at this for days with no luck. Most of the code you’ll see below is bits and pieces I’ve copied from every example Dockerfile, fly.toml, and litefs.yml I could find.

I’ve been watching my flyctl logs and I can’t identify any specific errors. The deployment finishes, but the logs loop over the same few messages stating:

POST /stream: error: lease expired
disconnected from primary with error, retrying: connect to primary: invalid response: code=503 (‘http://my machine’s id.vm.myapp.internal:20202’)"
POST /stream: error: cannot connect to self
etc…

After an earlier:
sea [info] INFO [fly api proxy] listening at /.fly/api
sea [info]config file read from /etc/litefs.yml
sea [info]LiteFS v0.5.7
sea [info]level=INFO msg=“host environment detected” type=fly.io
sea [info]level=INFO msg=“no backup client configured, skipping”
sea [info]level=INFO msg=“Using Consul to determine primary”
sea [info]level=INFO msg="initializing consul: key=key url=url hostname=hostname advertise-url=advertise-url
sea [info]level=INFO msg="using existing cluster id: id
sea [info]level=INFO msg=“LiteFS mounted to: /litefs”
sea [info]level=INFO msg=“http server listening on: http://localhost:20202”
sea [info]level=INFO msg=“waiting to connect to cluster”
sea [info]level=INFO msg= “existing primary found (machine id), connecting as replica to "http://machine id.vm.myapp.internal:20202"”
iad [info]level=INFO msg= stream connected
sea [info]level=INFO msg=“connected to cluster, ready”
sea [info]level=INFO msg=“node is not a candidate, skipping automatic promotion”
sea [info]level=INFO msg=“proxy server listening on: http://localhost:3000”
sea [info]waiting for signal or subprocess to exit
iad [info] Out of memory: Killed process 313 (litefs) …
iad [error]could not complete HTTP request to instance: connection closed before message completed
iad [error]could not complete HTTP request to instance: connection closed before message completed

After these messages, it restarts the process, this time without the “out of memory” or “could not complete HTTP request”, but followed by the initial looping logs I mentioned.

The actual web app doesn’t load at all.

I know this isn’t very specific, but something tells me the below code snippets will scream naivety and I might be able to glean some useful information from this post. As I mentioned above, I’m days down this rabbit hole and I’ve lost sight of any sort of baseline truth from which I could pose an intelligible question. I’m just desperate for some outside input. Thanks in advance, Michael.

Here is my Dockerfile:

FROM node:18-alpine

EXPOSE 3000
WORKDIR /app
COPY . .

ARG LITEFS_CONFIG=litefs.yml

COPY --from=flyio/litefs:0.5 /usr/local/bin/litefs /usr/local/bin/litefs

ADD etc/litefs.yml /tmp/litefs.yml

RUN cp /tmp/$LITEFS_CONFIG /etc/litefs.yml

RUN apk add bash fuse3 sqlite  ca-certificates curl

ENTRYPOINT litefs mount

RUN npm install
RUN npm run build

RUN rm -f prisma/dev.sqlite

CMD ["npm", "run", "docker-start"]

A snippet of package.json:

    "start": "remix-serve build",
    "docker-start": "npm run setup && npm run start",
    "setup": "prisma generate && prisma migrate deploy",

My fly.toml:

# fly.toml app configuration file generated for adelfi on 2023-10-10T17:40:15-04:00
#
# See https://fly.io/docs/reference/configuration/ for information about how to use this file.
#

app = "myapp"
primary_region = "iad"

[experimental]
  enable_consul = true
  cmd = ["start_with_migrations.sh"]
  entrypoint = ["sh"]

[build]

[env]
  DATABASE_URL = "file:/litefs/sqlite.db"
  INITIAL_DEPLOYMENT = "true"
  PORT = "3000"

[mounts]
  source = "litefs"
  destination = "/var/lib/litefs"

[[services]]
  internal_port = 3000
  processess = ["app"]
  auto_stop_machines = true
  auto_start_machines = true
  min_machines_running = 0

A snippet of schema.prisma:

generator client {
  provider = "prisma-client-js"
}

datasource db {
  provider = "sqlite"
  url      = "file:/litefs/sqlite.db"
}

start_with_migrations.sh

#!/bin/sh

set -ex

npx prisma migrate deploy

# Check if it's the initial deployment
if [ "$INITIAL_DEPLOYMENT" = "true" ]; then
  # Call JavaScript seeder file using the node command
  node prisma/seedDatabase.js

  # Logging
  echo "Database seeding completed for the initial deployment"
else
  # Logging
  echo "Initial deployment flag not set; skipping seeder script."
fi

npm run start

My litefs.yml:

# The fuse section describes settings for the FUSE file system. This file system
# is used as a thin layer between the SQLite client in your application and the
# storage on disk. It intercepts disk writes to determine transaction boundaries
# so that those transactions can be saved and shipped to replicas.
fuse:
  dir: "/litefs"

# The data section describes settings for the internal LiteFS storage. We'll 
# mount a volume to the data directory so it can be persisted across restarts.
# However, this data should not be accessed directly by the user application.
data:
  dir: "/var/lib/litefs"

# This flag ensure that LiteFS continues to run if there is an issue on starup.
# It makes it easy to ssh in and debug any issues you might be having rather
# than continually restarting on initialization failure.
exit-on-error: false

# This section defines settings for the option HTTP proxy.
# This proxy can handle primary forwarding & replica consistency
# for applications that use a single SQLite database.
proxy:
  addr: ":3000"
  target: "localhost:3000"
  db: "sqlite.db"
  passthrough: 
    - "*.ico"
    - "*.png"

# This section defines a list of commands to run after LiteFS has connected
# and sync'd with the cluster. You can run multiple commands but LiteFS expects
# the last command to be long-running (e.g. an application server). When the
# last command exits, LiteFS is shut down.
# exec:
  #- cmd:

# The lease section specifies how the cluster will be managed. We're using the
# "consul" lease type so that our application can dynamically change the primary.
#
# These environment variables will be available in your Fly.io application.
lease:
  type: "consul"
  advertise-url: "http://${HOSTNAME}.vm.${FLY_APP_NAME}.internal:20202"
  candidate: ${FLY_REGION == PRIMARY_REGION}
  promote: true

  consul:
    url: "${FLY_CONSUL_URL}"
    key: "litefs/${FLY_APP_NAME}"

pavel · October 13, 2023, 9:08am

Hi @michaelg

Are you developing on Windows? Looking at the logs, it seems start_with_migrations.sh has Windows line endings (\r\n) instead of UNIX ones (\n). Because of that, the script fails to execute on Linux.

Try adding the following line to .gitattributes files:

*.sh text eol=lf

After that, commit/push the file and re-checkout the repo. The file should have UNIX line endings after that.

michaelg · October 13, 2023, 8:06pm

Thanks, that was a huge help. After a couple more edits I’ve got it all up and running. LiteFS seems to be functioning properly as well. I had a feeling it would be something silly like that. Thanks again.

trkfabi · February 16, 2024, 3:48pm

Hi @michaelg, I’m new to Shopify and Remix apps and far far away from dev ops. I’ve deployed my app with the sqlite db to 2 VM and 2 volumes as Fly io suggests but I am stuck on how to keep those 2 volumes synched.
I also ended reading about LiteFS but I don’t understand how to replicate the volumes. Looks like you figured it out already… could you help me at least pointing to the right documentation? I can’t find a good tutorial
Thanks!

vritzka · March 4, 2024, 3:00am

When using Litefs cloud, the system takes care of replication.

Most web applications can take advantage of a thin, built-in proxy inside LiteFS that automatically handles these write redirection and replica consistency issues.

from Getting Started with LiteFS on Fly.io · Fly Docs

satoshibits · March 4, 2024, 12:36pm

Does this mean I can fly m clone --select --region lhr a machine that has an attached LiteFs volume and not worry about data from those volumes being out of sync? cc: @pavel

pavel · March 4, 2024, 2:33pm

This should work.
The newly created machine should find existing LiteFS primary node and sync the data from it.

justinb · April 2, 2024, 3:18pm

I don’t know much about devops and I’m more of a frontend developer. I found this project from a knowledgable member of the JS/TS community and it’s helping me get started so I thought I’d drop it here for anyone looking for a reliable starting point: GitHub - kentcdodds/kentcdodds.com: My personal website

Topic		Replies	Views
cannot find primary, retrying: no primary litefs , volumes	3	464	May 24, 2024
Connect external litefs read replica to primary in fly.io Questions / Help sqlite , litefs	2	62	September 3, 2024
Deployment Failed using Litefs, start.sh, Sqlite, Cachified, etc on Remix Questions / Help litefs	11	643	July 6, 2023
Small consul cluster + litefs failover Questions / Help consul , litefs	4	729	April 4, 2023
Connecting to a litefs cluster from outside of fly.io Questions / Help litefs	4	426	September 29, 2023

(SOS) LiteFS + Fly + Remix configuration

Related topics