I’d like to preface this by saying that I love the idea of distributed sqlite on the edge, although it’s been a bit tricky to get it going!
I am running LiteFS on a static lease without proxy, and without backups at the moment. I understand this means that my data is wiped out on each restart. The issue I am facing is that the cluster id on the primary is reset on each restart, but the replicas seem to be unaware of it.
If I destroy all my machines, and then use fly scale
everything works fine. Now when I run fly app restart
my replicas throw out these logs:
2024-06-05T14:54:43.320 app[56830395bd748e] iad [info] level=INFO msg="335F1DB34CB40FA0: existing primary found (iad.fc-app.internal), connecting as replica to \"http://fra.fc-app.internal:20202\""
2024-06-05T14:54:43.407 app[56830395bd748e] iad [info] level=INFO msg="335F1DB34CB40FA0: disconnected from primary with error, retrying: cannot stream from primary with a different cluster id: LFSCCFFB7B74AFC8058C <> LFSC158C911F5D7A5AEA"
2024-06-05T14:54:44.407 app[56830395bd748e] iad [info] level=INFO msg="335F1DB34CB40FA0: existing primary found (iad.fc-app.internal), connecting as replica to \"http://fra.fc-app.internal:20202\""
2024-06-05T14:54:44.493 app[56830395bd748e] iad [info] level=INFO msg="335F1DB34CB40FA0: disconnected from primary with error, retrying: cannot stream from primary with a different cluster id: LFSCCFFB7B74AFC8058C <> LFSC158C911F5D7A5AEA"
LFSCCFFB7B74AFC8058C was the old cluster id.
Section of my litefs.yml if it matters
lease:
type: "static"
candidate: ${FLY_REGION == PRIMARY_REGION}
promote: ${FLY_REGION == PRIMARY_REGION}
hostname: "${FLY_REGION}.${FLY_APP_NAME}.internal"
advertise-url: "http://${PRIMARY_REGION}.${FLY_APP_NAME}.internal:20202"
I understand this problem could be resolved with volumes since the cluster id would persist, but there should also be a way to get this to work without persistence, right?