I’ve managed to break a production app database syncing. Database writes seem to be happening locally and are not synced to litefs. litefs export
returns database pre-data-changes. I’m looking for advice on steps to troubleshoot.
After having no luck for a few hours I decided to try a clean slate approach—destroying all volumes and machines and restoring the db from a backup on a single instance for simplicity. Unfortunately the problem persists. The app is running and is usable, but data changes appear to only happen locally, as the web interface and litefs export
results in a database state identical to the pre-change backup.
Background
Last night some of the app’s secrets were accidentally tweaked. This seems to be the start of the problem, but I was also having issues with the deployment freezing which I resolved using fly deploy --local-only
and I can’t rule that out either. I think the secrets have been fixed, but given it’s still not working right, I am not confident. One secret tweaked was DATABASE_URL, which the app uses to open the database. Prior to this issue, the app had been successfully running with multiple instances.
Relevant sections of config below.
litefs.yaml
fuse:
dir: "/litefs"
data:
dir: "/var/lib/litefs"
exit-on-error: false
proxy:
addr: ":8080"
target: "localhost:3000"
db: "db"
exec:
- cmd: "pnpm run start"
lease:
type: "consul"
candidate: ${FLY_REGION == PRIMARY_REGION}
promote: true
advertise-url: "http://${FLY_ALLOC_ID}.vm.${FLY_APP_NAME}.internal:20202"
consul:
url: "${FLY_CONSUL_URL}"
key: "${FLY_APP_NAME}/primary"
fly.toml
[[mounts]]
source = "litefs"
destination = "/var/lib/litefs"
[http_service]
internal_port = 8080
force_https = true
auto_stop_machines = true
auto_start_machines = true
min_machines_running = 1
processes = ["app"]
Dockerfile
ADD litefs.yml /etc/litefs.yml
RUN mkdir -p /litefs /var/lib/litefs
ENTRYPOINT ["litefs", "mount"]