Hi, I recently noticed that on my website ~5 posts (the 5 most recent, not random posts) seem to have disappeared (completely missing from the database) and I have no idea how, and I can’t think of anything within my application that would’ve caused this. It really seems like something made my database rollback.
Do I need to have at least one machine running in my primary region? That’s what I’ve been doing recently, but initially I didn’t.
Early on I had some similar(?) data loss (not sure if the database just completely disappeared or if it had just rolled back to when the migrations happened). I think this was caused by the lack of a machine always running, but I’m not sure. My theory is that both machines were running and in sync, then machine A stopped and machine B’s db had new stuff added, and then machine B stopped. Then machine A started and became the primary with it’s out dated data. Is this theory correct/possible?
Is there anything I should be looking for that could be causing my most recent data loss (the ~5 posts disappearing)? With LiteFS Cloud sunsetting now I don’t have a good backup strategy anymore, and I’d prefer to not have random data loss like this…
I agree that it would be nice if this was stated explicitly in the docs. It’s been covered in the forum before, but it takes some digging. Maybe the new L3 docs supremo will rally the troops to move v0.5 → v1.
(Few things are more rewarding than helping those who not only wish to make things better but to also learn themselves while doing, after all, .)
I tried this myself, since I had a test cluster already provisioned, and it went as you said…
I was curious about what would happen when machine B woke back up, as well:
client transaction id (0000000000000005) exceeds primary transaction id (0000000000000003),
clearing client position
And then it kept chugging along (with the older database).
(I.e., no fireworks.)
Overall, you want at least two primary-candidate nodes running at all times, though, otherwise if one goes down abruptly, , the other won’t have the opportunity to get caught back up.
Thanks so much for looking into this and the additional insights! Sucks that this isn’t documented very well, but hopefully it is soon. Also hadn’t seen that thread you linked before despite searching for info on this before.