LiteFS Cloud Backup database not created in cluster

I’m trying to set up LiteFS Cloud backups on an existing application but the streaming seems to be hanging on “begin streaming backup”.

Some notes on the setup:

  • There is only a single node in the cluster
  • The single machine is set to scale to 0 with no traffic
  • The LiteFS lease strategy is set to “static”
  • I was running into issue with the machine getting stuck in a “replica” state with "consul", so I set the lease strategy to “static” to avoid that
  • The LiteFS DB is working fine within the application, accepting writes and reads
  • The LiteFS Cloud cluster has no databases. From reading the setup docs, I’m led to believe I only needed to create the cluster, and a database would be created automatically. Maybe this is the root cause? If so, it would be helpful to have an error message in the logs indicating that the database is missing.

Here are the application logs (from May 29, 2024):

16:44:45.910 [ 429.835022] reboot: Restarting system
16:44:45.909 WARN could not unmount /rootfs: EINVAL: Invalid argument
16:44:45.908 INFO Umounting /dev/vdc from /var/lib/litefs
16:44:45.907 INFO Starting clean up.
16:44:45.894 INFO Main child exited normally with code: 0
16:44:45.891 level=INFO msg="12A92A036F318BDA: exiting primary, destroying lease"
16:44:45.891 level=INFO msg="primary backup stream exiting"
16:44:45.891 level=INFO msg="exiting streaming backup"
16:44:45.891 litefs shut down complete
16:44:45.889 signal received, litefs shutting down
16:44:45.889 closing database
16:44:45.889 May 29 20:44:45.888 INF stopped server server.addr=:4444
16:44:45.889 waiting for exec process to close
16:44:45.889 sending signal to exec process
16:44:45.888 INFO Sending signal SIGINT to main child process w/ PID 322
16:44:45.884 Downscaling app apialum from 1 machines to 0 machines, stopping machine [REDACTED] (region=dfw, process group=app)
16:37:37.431 level=INFO msg="begin streaming backup" full-sync-interval=10s
16:37:36.552 May 29 20:37:36.552 INF access user.ip= request.method=GET request.url=/v1/grads request.proto=HTTP/1.1 response.status=200 response.size=64118
16:37:36.541 May 29 20:37:36.540 INF starting server server.addr=:4444
16:37:36.465 open connections: 0
16:37:36.465 DSN: /litefs/db
16:37:36.465 May 29 20:37:36.464 WRN failed to load .env file error="open .env: no such file or directory"
16:37:36.433 machine became reachable in 22.013745ms
16:37:36.432 waiting for signal or subprocess to exit
16:37:36.431 level=INFO msg="starting background subprocess: /usr/local/bin/apialum [-dsn /litefs/db]"
16:37:36.431 level=INFO msg="proxy server listening on: http://localhost:8080"
16:37:36.431 level=INFO msg="node is already primary, skipping promotion"
16:37:36.431 level=INFO msg="node is a candidate, automatically promoting to primary"
16:37:36.431 level=INFO msg="connected to cluster, ready"
16:37:36.431 level=INFO msg="waiting to connect to cluster"
16:37:36.431 level=INFO msg="http server listening on: http://localhost:20202"
16:37:36.431 level=INFO msg="LiteFS mounted to: /litefs"
16:37:36.430 level=INFO msg="begin primary backup stream: url="
16:37:36.429 level=INFO msg="set cluster id on \"static\" lease \"[REDACTED]\""
16:37:36.429 level=INFO msg="12A92A036F318BDA: primary lease acquired, advertising as http://[REDACTED]:20202"
16:37:36.427 level=INFO msg="using existing cluster id: \"[REDACTED]\""
16:37:36.425 level=INFO msg="wal-sync: short wal file exists on \"db\", skipping sync with ltx"
16:37:36.422 level=INFO msg="Using static primary: primary=true hostname= advertise-url=http://[REDACTED]:20202"
16:37:36.422 level=INFO msg="litefs cloud backup client configured:"
16:37:36.422 level=INFO msg="host environment detected"
16:37:36.421 LiteFS v0.5.11, commit=63eab529dc3353e8d159e097ffc4caa7badb8cb3
16:37:36.421 config file read from /etc/litefs.yml
16:37:36.411 machine started in 477.666491ms
16:37:36.409 Machine started in 474ms

if you notice, the “begin streaming backup” started at 16:37, then 8 minutes later, nothing else streaming-related has happened. No errors, no success message. I’m not sure where to go next.

Since the app is in its infancy, I will try to manually export the litefs database and use that to create a database in the LiteFS Cloud Cluster. However, the “Using LiteFS Cloud For Backups” post wording sounds like you only need to create a cluster, restart your application, and you should be good. I’m not sure where I missed a step.

Thanks in advance for any help : )

1 Like

Here’s my LiteFS config as well

# This directory is where your application will access the database.
  dir: "/litefs"

# This directory is where LiteFS will store internal data.
# You must place this directory on a persistent volume.
  dir: "/var/lib/litefs"

#  This flag ensure that LiteFS continues to run if there is an issue on start up.
# It makes it easy to ssh in and debug any issues you might be having rather
# than continually restarting on initialization failure.
exit-on-error: false

# This section defines settings for the option HTTP proxy.
# This proxy can handle primary forwarding & replica consistency
# for applications that use a single SQLite database.
  # Bind address for the proxy to listen on.
  addr: ":8080"
  # Hostport of your application - replace 8081 with whatever port
  # your application is listening on!
  target: "localhost:4444"
  # Filename of the SQLite database you want to use for TXID tracking.
  db: "db"
    - "*.ico"
    - "*.png"

# The lease section defines how LiteFS creates a cluster and
# implements leader election. For dynamic clusters, use the
# "consul". This allows the primary to change automatically when
# the current primary goes down. For a simpler setup, use
# "static" which assigns a single node to be the primary and does
# not failover.
  # Using static lease for a single node cluster. In a single-node cluster, the
  # node often gets stuck in the "replica" role using consul.
  type: "static"

  # Specifies if this node can become primary. The expression below evaluates
  # to true on nodes that are run in the primary region. Nodes in other regions
  # act as non-candidate, read-only replicas.
  candidate: true

  # If true, then the node will automatically become primary after it has
  # connected with the cluster and sync'd up. This makes it easier to run
  # migrations on start up.
  promote: true

  # The API URL that other nodes will use to connect to this node.
  advertise-url: "http://$HOSTNAME:20202"

    # The URL of the Consul cluster.
    url: "${FLY_CONSUL_URL}"

    # A unique key shared by all nodes in the LiteFS cluster.
    # Change this if you are running multiple clusters in a single app!
    key: "litefs/${FLY_APP_NAME}-v2"

  - cmd: "/usr/local/bin/apialum -dsn /litefs/db"

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.