LiteFS `promote: true` can cause data loss when scaling horizontally - seeking guidance

Thanks for the additional details… Yeah, manually intervening on /data/litefs is mentioned multiple times in the forum archives as causing serious (and confusing) problems. Combined with the “short WAL file”, i.e., premature EOF, message in the logs, I think that makes it the current prime suspect (as it were).

Ideally there would be better guardrails around things like this, and maybe the official docs should name the example low-level mountpoint /var/lib/never-touch-this-directly/ or similar.

I’ll try a “smoking gun” reproduction over the next few days, using this new information, but my guess is that it’s not a state that you can reach during normal operations…


Aside: for extra peace of mind, though, you might want to look into LiteFS Backup, which is a streaming-backups system that is compatible with Consul leasing, generously made available under an open source license by a fellow user.

(It provides point-in-time restores, and not just daily snapshots.)