Newly added machines take over as cluster primary?

I’m trying to setup an app with SQLite and LiteFS/fuse on Fly.io. The app works with a single machine but I’m having trouble with the consul/LiteFS setup when adding a second machine.

Looking at the logout when adding a second machine, it seems that the newly added machine is immediately taking over as primary. Is it expected that this should happen?

2023-09-22T05:55:10Z runner[32874ddec16748] nrt [info]Setting up volume 'litefs'
2023-09-22T05:55:10Z runner[32874ddec16748] nrt [info]Uninitialized volume 'litefs', initializing...
2023-09-22T05:55:10Z runner[32874ddec16748] nrt [info]Encrypting volume
2023-09-22T05:55:18Z runner[32874ddec16748] nrt [info]Opening encrypted volume
2023-09-22T05:55:20Z runner[32874ddec16748] nrt [info]Formatting volume
2023-09-22T05:55:23Z runner[32874ddec16748] nrt [info]Configuring firecracker
2023-09-22T05:55:24Z app[32874ddec16748] nrt [info][    0.041518] PCI: Fatal: No config space access function found
2023-09-22T05:55:24Z app[32874ddec16748] nrt [info] INFO Starting init (commit: 9fc6a62)...
2023-09-22T05:55:24Z app[32874ddec16748] nrt [info] INFO Mounting /dev/vdb at /var/lib/litefs w/ uid: 0, gid: 0 and chmod 0755
2023-09-22T05:55:24Z app[32874ddec16748] nrt [info] INFO Resized /var/lib/litefs to 1056964608 bytes
2023-09-22T05:55:24Z app[32874ddec16748] nrt [info] INFO Preparing to run: `litefs mount` as root
2023-09-22T05:55:24Z app[32874ddec16748] nrt [info] INFO [fly api proxy] listening at /.fly/api
2023-09-22T05:55:24Z app[32874ddec16748] nrt [info]2023/09/22 05:55:24 listening on [fdaa:0:9ced:a7b:17d:14ad:99f5:2]:22 (DNS: [fdaa::3]:53)
2023-09-22T05:55:24Z app[32874ddec16748] nrt [info]config file read from /app/litefs.yml
2023-09-22T05:55:24Z app[32874ddec16748] nrt [info]LiteFS v0.5.6, commit=672c7eabb316e036bdf5f0bdbbcadbe953b00b4c
2023-09-22T05:55:24Z app[32874ddec16748] nrt [info]level=INFO msg="host environment detected" type=fly.io
2023-09-22T05:55:24Z app[32874ddec16748] nrt [info]level=INFO msg="no backup client configured, skipping"
2023-09-22T05:55:24Z app[32874ddec16748] nrt [info]level=INFO msg="Using Consul to determine primary"
2023-09-22T05:55:25Z app[32874ddec16748] nrt [info]level=INFO msg="initializing consul: key=litefs/te-litefs url=https://:9e0a4fa8-40a7-3515-3032-137e7b857954@consul-syd-5.fly-shared.net/te-litefs-p7vx1jwpz4r9k3z5/ hostname=32874ddec16748 advertise-url=http://32874ddec16748.vm.te-litefs.internal:20202"
2023-09-22T05:55:25Z app[32874ddec16748] nrt [info]level=INFO msg="LiteFS mounted to: /litefs"
2023-09-22T05:55:25Z app[32874ddec16748] nrt [info]level=INFO msg="http server listening on: http://localhost:20202"
2023-09-22T05:55:25Z app[32874ddec16748] nrt [info]level=INFO msg="waiting to connect to cluster"
2023-09-22T05:55:26Z app[32874ddec16748] nrt [info]level=INFO msg="cannot become primary, local node has no cluster ID and \"consul\" lease already initialized with cluster ID LFSC40BE408D79ECF368"
2023-09-22T05:55:26Z app[32874ddec16748] nrt [info]level=INFO msg="AD533B24FE28CE56: existing primary found (e2865111f69418), connecting as replica to \"http://e2865111f69418.vm.te-litefs.internal:20202\""
2023-09-22T05:55:26Z app[e2865111f69418] nrt [info]level=INFO msg="B458845051DAC717: stream connected ([fdaa:0:9ced:a7b:17d:14ad:99f5:2]:53766)"
2023-09-22T05:55:26Z app[e2865111f69418] nrt [info]level=INFO msg="starting from txid 0000000000000001, writing snapshot"
2023-09-22T05:55:26Z app[e2865111f69418] nrt [info]level=INFO msg="writing snapshot \"db.sqlite3\" @ 0000000000000012"
2023-09-22T05:55:26Z app[32874ddec16748] nrt [info]level=INFO msg="database file is zero length on initialization: /var/lib/litefs/dbs/db.sqlite3/database"
2023-09-22T05:55:26Z app[32874ddec16748] nrt [info]level=INFO msg="snapshot received for \"db.sqlite3\", removing other ltx files: 0000000000000001-0000000000000012.ltx"
2023-09-22T05:55:26Z app[32874ddec16748] nrt [info]level=INFO msg="connected to cluster, ready"
2023-09-22T05:55:26Z app[32874ddec16748] nrt [info]level=INFO msg="node is a candidate, automatically promoting to primary"
2023-09-22T05:55:26Z app[e2865111f69418] nrt [info]level=INFO msg="B458845051DAC717: exiting primary, preserving lease for handoff"
2023-09-22T05:55:26Z app[32874ddec16748] nrt [info]level=INFO msg="AD533B24FE28CE56: disconnected from primary, retrying"
2023-09-22T05:55:26Z app[e2865111f69418] nrt [info]level=INFO msg="B458845051DAC717: stream disconnected ([fdaa:0:9ced:a7b:db52:9bc7:861a:2]:39042)"
2023-09-22T05:55:26Z app[e2865111f69418] nrt [info]level=INFO msg="B458845051DAC717: stream disconnected ([fdaa:0:9ced:a7b:17d:14ad:99f5:2]:53766)"
2023-09-22T05:55:26Z app[5683779b63678e] nrt [info]level=INFO msg="56186645FF60E88B: disconnected from primary, retrying"
2023-09-22T05:55:26Z app[32874ddec16748] nrt [info]level=INFO msg="AD533B24FE28CE56: acquiring existing lease from handoff"
2023-09-22T05:55:26Z app[e2865111f69418] nrt [info]level=INFO msg="B458845051DAC717: existing primary found (e2865111f69418), connecting as replica to \"http://e2865111f69418.vm.te-litefs.internal:20202\""
2023-09-22T05:55:26Z app[e2865111f69418] nrt [info]level=INFO msg="http: POST /stream: error: cannot connect to self"
2023-09-22T05:55:26Z app[e2865111f69418] nrt [info]level=INFO msg="B458845051DAC717: disconnected from primary with error, retrying: connect to primary: invalid response: code=400 ('http://e2865111f69418.vm.te-litefs.internal:20202')"
2023-09-22T05:55:26Z app[32874ddec16748] nrt [info]level=INFO msg="AD533B24FE28CE56: primary lease acquired, advertising as http://32874ddec16748.vm.te-litefs.internal:20202"
2023-09-22T05:55:27Z app[32874ddec16748] nrt [info]level=INFO msg="proxy server listening on: http://localhost:8080"

Hi @tacomanator

Yeah, if lease.candidate: true and lease.promote: true, a newly started node will try to promote itself to become the primary.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.