Hi… LiteFS really is fun to play with, , but there are some details that the official documentation either doesn’t emphasize or leaves entirely to the reader to infer from their existing distributed systems knowledge…
Starting with the easier one, the error you were seeing is probably because you have internal_port
configured incorrectly: traffic was bypassing the Fly-Replay
mechanism completely. (Since the LiteFS proxy was out of the loop.)
The following older post has an explicit table showing how things are supposed to match:
https://community.fly.io/t/setting-up-litefs-with-the-proxy-and-docker/20927/2
This is actually mainly through the .internal
network (a.k.a. 6PN)—but optionally also occasionally via Fly-Replay
, when redirecting incoming POST
s to the primary, etc.
You do want something like that, but for different reasons. One of the things that was left to deduction is that you should always have two primary-candidates running at (almost) all times. (I.e., min_machines_running = 2
.) This makes sense if you consider what happens if the existing primary fails but the replacement candidate has been asleep for the past 3 months…
As a final tip, always look at the logs (fly logs
) and event stream when doing experiments with LiteFS handovers; those really tend to clear the mists of who currently has what baton, etc.
Hope this helps!