LiteFS - cluster id mismatch when apps are restarted

nish · June 5, 2024, 3:06pm

I’d like to preface this by saying that I love the idea of distributed sqlite on the edge, although it’s been a bit tricky to get it going!

I am running LiteFS on a static lease without proxy, and without backups at the moment. I understand this means that my data is wiped out on each restart. The issue I am facing is that the cluster id on the primary is reset on each restart, but the replicas seem to be unaware of it.

If I destroy all my machines, and then use fly scale everything works fine. Now when I run fly app restart my replicas throw out these logs:

2024-06-05T14:54:43.320 app[56830395bd748e] iad [info] level=INFO msg="335F1DB34CB40FA0: existing primary found (iad.fc-app.internal), connecting as replica to \"http://fra.fc-app.internal:20202\""

2024-06-05T14:54:43.407 app[56830395bd748e] iad [info] level=INFO msg="335F1DB34CB40FA0: disconnected from primary with error, retrying: cannot stream from primary with a different cluster id: LFSCCFFB7B74AFC8058C <> LFSC158C911F5D7A5AEA"

2024-06-05T14:54:44.407 app[56830395bd748e] iad [info] level=INFO msg="335F1DB34CB40FA0: existing primary found (iad.fc-app.internal), connecting as replica to \"http://fra.fc-app.internal:20202\""

2024-06-05T14:54:44.493 app[56830395bd748e] iad [info] level=INFO msg="335F1DB34CB40FA0: disconnected from primary with error, retrying: cannot stream from primary with a different cluster id: LFSCCFFB7B74AFC8058C <> LFSC158C911F5D7A5AEA"

LFSCCFFB7B74AFC8058C was the old cluster id.

Section of my litefs.yml if it matters

lease:
  type: "static"
  candidate: ${FLY_REGION == PRIMARY_REGION}
  promote: ${FLY_REGION == PRIMARY_REGION}
  hostname: "${FLY_REGION}.${FLY_APP_NAME}.internal"
  advertise-url: "http://${PRIMARY_REGION}.${FLY_APP_NAME}.internal:20202"

I understand this problem could be resolved with volumes since the cluster id would persist, but there should also be a way to get this to work without persistence, right?

nish · June 7, 2024, 2:50pm

I found the solution(s) for anyone else having this problem. When working with a static lease you I had to use this setup in litefs.yml:

lease:
  type: "static"
  candidate: ${FLY_REGION == PRIMARY_REGION}
  promote: ${FLY_REGION == PRIMARY_REGION}
  hostname: "${FLY_REGION}.${FLY_APP_NAME}.internal"
  advertise-url: "http://${PRIMARY_REGION}.${FLY_APP_NAME}.internal:20202"

In addition, I had to mount the data.dir since the cluster id is stored here. As far as I know there isn’t a way to get this to work without persisting the cluster id.

system · June 14, 2024, 2:51pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
[LiteFS] Cannot Become Primary After Restart litefs	11	806	September 25, 2023
Elixir SQLite LiteFS dns_cluster - cannot fetch cluster ID Phoenix elixir , sqlite , litefs	4	511	November 6, 2023
Postmortem: Fly Registry (2023-08-08) postmortem	1	281	August 17, 2023
Remix + LiteFS deployment issue. Can not fetch cluster ID connection refused Build debugging litefs	0	141	July 5, 2024
Newly added machines take over as cluster primary? Questions / Help consul , sqlite , litefs	2	277	September 29, 2023

LiteFS - cluster id mismatch when apps are restarted

Related topics