I’m trying to set up LiteFS Cloud backups on an existing application but the streaming seems to be hanging on “begin streaming backup”.
Some notes on the setup:
- There is only a single node in the cluster
- The single machine is set to scale to 0 with no traffic
- The LiteFS lease strategy is set to “static”
- I was running into issue with the machine getting stuck in a “replica” state with
"consul"
, so I set the lease strategy to “static” to avoid that - The LiteFS DB is working fine within the application, accepting writes and reads
- The LiteFS Cloud cluster has no databases. From reading the setup docs, I’m led to believe I only needed to create the cluster, and a database would be created automatically. Maybe this is the root cause? If so, it would be helpful to have an error message in the logs indicating that the database is missing.
Here are the application logs (from May 29, 2024):
16:44:45.910 [ 429.835022] reboot: Restarting system
16:44:45.909 WARN could not unmount /rootfs: EINVAL: Invalid argument
16:44:45.908 INFO Umounting /dev/vdc from /var/lib/litefs
16:44:45.907 INFO Starting clean up.
16:44:45.894 INFO Main child exited normally with code: 0
16:44:45.891 level=INFO msg="12A92A036F318BDA: exiting primary, destroying lease"
16:44:45.891 level=INFO msg="primary backup stream exiting"
16:44:45.891 level=INFO msg="exiting streaming backup"
16:44:45.891 litefs shut down complete
16:44:45.889 signal received, litefs shutting down
16:44:45.889 closing database
16:44:45.889 May 29 20:44:45.888 INF stopped server server.addr=:4444
16:44:45.889 waiting for exec process to close
16:44:45.889 sending signal to exec process
16:44:45.888 INFO Sending signal SIGINT to main child process w/ PID 322
16:44:45.884 Downscaling app apialum from 1 machines to 0 machines, stopping machine [REDACTED] (region=dfw, process group=app)
16:37:37.431 level=INFO msg="begin streaming backup" full-sync-interval=10s
16:37:36.552 May 29 20:37:36.552 INF access user.ip=104.54.74.132 request.method=GET request.url=/v1/grads request.proto=HTTP/1.1 response.status=200 response.size=64118
16:37:36.541 May 29 20:37:36.540 INF starting server server.addr=:4444
16:37:36.465 open connections: 0
16:37:36.465 DSN: /litefs/db
16:37:36.465 May 29 20:37:36.464 WRN failed to load .env file error="open .env: no such file or directory"
16:37:36.433 machine became reachable in 22.013745ms
16:37:36.432 waiting for signal or subprocess to exit
16:37:36.431 level=INFO msg="starting background subprocess: /usr/local/bin/apialum [-dsn /litefs/db]"
16:37:36.431 level=INFO msg="proxy server listening on: http://localhost:8080"
16:37:36.431 level=INFO msg="node is already primary, skipping promotion"
16:37:36.431 level=INFO msg="node is a candidate, automatically promoting to primary"
16:37:36.431 level=INFO msg="connected to cluster, ready"
16:37:36.431 level=INFO msg="waiting to connect to cluster"
16:37:36.431 level=INFO msg="http server listening on: http://localhost:20202"
16:37:36.431 level=INFO msg="LiteFS mounted to: /litefs"
16:37:36.430 level=INFO msg="begin primary backup stream: url=https://litefs.fly.io"
16:37:36.429 level=INFO msg="set cluster id on \"static\" lease \"[REDACTED]\""
16:37:36.429 level=INFO msg="12A92A036F318BDA: primary lease acquired, advertising as http://[REDACTED]:20202"
16:37:36.427 level=INFO msg="using existing cluster id: \"[REDACTED]\""
16:37:36.425 level=INFO msg="wal-sync: short wal file exists on \"db\", skipping sync with ltx"
16:37:36.422 level=INFO msg="Using static primary: primary=true hostname= advertise-url=http://[REDACTED]:20202"
16:37:36.422 level=INFO msg="litefs cloud backup client configured: https://litefs.fly.io"
16:37:36.422 level=INFO msg="host environment detected" type=fly.io
16:37:36.421 LiteFS v0.5.11, commit=63eab529dc3353e8d159e097ffc4caa7badb8cb3
16:37:36.421 config file read from /etc/litefs.yml
16:37:36.411 machine started in 477.666491ms
16:37:36.409 Machine started in 474ms
if you notice, the “begin streaming backup” started at 16:37, then 8 minutes later, nothing else streaming-related has happened. No errors, no success message. I’m not sure where to go next.
Since the app is in its infancy, I will try to manually export the litefs database and use that to create a database in the LiteFS Cloud Cluster. However, the “Using LiteFS Cloud For Backups” post wording sounds like you only need to create a cluster, restart your application, and you should be good. I’m not sure where I missed a step.
Thanks in advance for any help : )