LiteFS backup stream error, can't restart app

After a recent deploy my LiteFS app started spamming logs like these:

2023-11-13T06:40:24Z app[e2865941ae65e8] sea [info]level=INFO msg="begin streaming backup" full-sync-interval=10s
2023-11-13T06:40:25Z app[e2865941ae65e8] sea [info]level=INFO msg="exiting streaming backup"
2023-11-13T06:40:25Z app[e2865941ae65e8] sea [info]level=INFO msg="backup stream failed, retrying: backup stream error (\"dino.db\"): write backup tx: backup client error (400): database extend without all pages, expected 232 new pages, found 230 pages"
2023-11-13T06:40:25Z app[e2865941ae65e8] sea [info]level=INFO msg="begin streaming backup" full-sync-interval=10s
2023-11-13T06:40:27Z app[e2865941ae65e8] sea [info]level=INFO msg="exiting streaming backup"
2023-11-13T06:40:27Z app[e2865941ae65e8] sea [info]level=INFO msg="backup stream failed, retrying: backup stream error (\"dino.db\"): write backup tx: backup client error (409): cannot write while transaction in progress"

My app appears to be working fine, but now I can’t restart any machines either, as they get this error upon LiteFS starting up

2023-11-13T06:35:49.085 app[148ed092b33ed8] syd [info] LiteFS v0.5.8, commit=fd5808d0250cd17ec420225a010ec273971a471b
2023-11-13T06:35:49.085 app[148ed092b33ed8] syd [info] level=INFO msg="host environment detected" type=fly.io
2023-11-13T06:35:49.085 app[148ed092b33ed8] syd [info] level=INFO msg="litefs cloud backup client configured: https://litefs.fly.io"
2023-11-13T06:35:49.086 app[148ed092b33ed8] syd [info] level=INFO msg="Using Consul to determine primary"
2023-11-13T06:35:49.572 app[148ed092b33ed8] syd [info] level=INFO msg="initializing consul: key=litefs/dinodino url=https://:4913d739-6273-7be6-9b0a-522c4a79f21d@consul-iad-4.fly-shared.net/dinodino-j3g8qvj02y8qdvxy/ hostname=148ed092b33ed8 advertise-url=http://148ed092b33ed8.vm.dinodino.internal:20202"
2023-11-13T06:35:49.577 app[148ed092b33ed8] syd [info] level=INFO msg="wal-sync: no wal file exists on \"dino.db\", skipping sync with ltx"
2023-11-13T06:35:49.700 app[148ed092b33ed8] syd [info] ERROR: cannot open store: open databases: open database("dino.db"): recover ltx: database checksum c9d56c921096e0ee on TXID 000000000000373a does not match LTX post-apply checksum e777fd8c0d4430ec
2023-11-13T06:35:49.700 app[148ed092b33ed8] syd [info] waiting for signal or subprocess to exit
2023-11-13T06:35:52.130 proxy[148ed092b33ed8] syd [error] instance refused connection. is your app listening on 0.0.0.0:8080? make sure it is not only listening on 127.0.0.1 (hint: look at your startup logs, servers often print the address they are listening on)
2023-11-13T06:35:52.131 proxy[148ed092b33ed8] syd [error] instance refused connection. is your app listening on 0.0.0.0:8080? make sure it is not only listening on 127.0.0.1 (hint: look at your startup logs, servers often print the address they are listening on)
2023-11-13T06:35:56.396 health[148ed092b33ed8] syd [error] Health check on port 8080 has failed. Your app is not responding properly. Services exposed on ports [80, 443] will have intermittent failures until the health check passes.
2023-11-13T06:35:57.647 proxy[148ed092b33ed8] syd [error] instance refused connection. is your app listening on 0.0.0.0:8080? make sure it is not only listening on 127.0.0.1 (hint: look at your startup logs, servers often print the address they are listening on)

This came right after implementing a full search feature in my app using the SQLite FTS5 extension, so I assume the issue has something to do with that, or my implementation of it, but I’m still stumped! To set this up, I changed litefs.yml to run a script on candidate nodes, which runs the following SQL commands:

CREATE VIRTUAL TABLE IF NOT EXISTS ImagePostFTS USING fts5(id UNINDEXED, prompt);

INSERT INTO ImagePostFTS(id, prompt)
SELECT id, prompt FROM ImagePost
WHERE id NOT IN (SELECT id FROM ImagePostFTS);

CREATE TRIGGER IF NOT EXISTS ImagePost_ai AFTER INSERT ON ImagePost BEGIN
INSERT INTO ImagePostFTS(id, prompt) VALUES (new.id, new.prompt);
END;

CREATE TRIGGER IF NOT EXISTS ImagePost_au AFTER UPDATE ON ImagePost BEGIN
UPDATE ImagePostFTS SET prompt = new.prompt WHERE id = old.id;
END;

CREATE TRIGGER IF NOT EXISTS ImagePost_ad AFTER DELETE ON ImagePost BEGIN
DELETE FROM ImagePostFTS WHERE id = old.id;
END;

Any insight would be much appreciated!

I’m not sure why FTS5 would be causing an issue. You may want to try clearing (or just moving) the data in /var/lib/litefs/dbs on one primary node and restarting that node. It should restore from LiteFS Cloud automatically.

This works to restore it, but it only seems to take a few minutes after restarting before the same issue starts again…

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.