Rebooting LiteFS resulted in error

Things were humming along fine in kcd when suddenly I decided to reboot the VM (I had a good-ish reason to want to do this that is irrelevant). I should not have done this right before going to sleep :man_facepalming: Here are the logs of what happened:

2022-11-23T07:25:05Z runner[48ff40af] den [info]Shutting down virtual machine
2022-11-23T07:25:05Z app[48ff40af] den [info]Sending signal SIGINT to main child process w/ PID 529
2022-11-23T07:25:05Z app[48ff40af] den [info]sending signal to exec process
2022-11-23T07:25:05Z app[48ff40af] den [info]waiting for exec process to close
2022-11-23T07:25:05Z app[48ff40af] den [info]signal received, litefs shutting down
2022-11-23T07:25:05Z app[48ff40af] den [info]exiting primary, destroying lease
2022-11-23T07:25:05Z app[48ff40af] den [info]exit status 1: fusermount: failed to unmount /litefs/data: Device or resource busy
2022-11-23T07:25:05Z app[48ff40af] den [info]Error watching sqlite.db-pos [Error: ENOTCONN: socket is not connected, stat '/litefs/data'] {
2022-11-23T07:25:05Z app[48ff40af] den [info]  errno: -107,
2022-11-23T07:25:05Z app[48ff40af] den [info]  code: 'ENOTCONN',
2022-11-23T07:25:05Z app[48ff40af] den [info]  syscall: 'stat',
2022-11-23T07:25:05Z app[48ff40af] den [info]  path: '/litefs/data'
2022-11-23T07:25:05Z app[48ff40af] den [info]}
2022-11-23T07:25:06Z app[48ff40af] den [info]Starting clean up.
2022-11-23T07:25:06Z app[48ff40af] den [info]Umounting /dev/vdc from /data
2022-11-23T07:25:06Z app[48ff40af] den [info]error umounting /data: EBUSY: Device or resource busy, retrying in a bit
2022-11-23T07:25:09Z runner[5616ea5f] den [info]Starting instance
2022-11-23T07:25:09Z runner[5616ea5f] den [info]Configuring virtual machine
2022-11-23T07:25:09Z runner[5616ea5f] den [info]Pulling container image
2022-11-23T07:25:09Z runner[5616ea5f] den [info]Unpacking image
2022-11-23T07:25:09Z runner[5616ea5f] den [info]Preparing kernel init
2022-11-23T07:25:09Z runner[5616ea5f] den [info]Setting up volume 'data'
2022-11-23T07:25:09Z runner[5616ea5f] den [info]Opening encrypted volume
2022-11-23T07:25:10Z runner[5616ea5f] den [info]Configuring firecracker
2022-11-23T07:25:10Z runner[5616ea5f] den [info]Starting virtual machine
2022-11-23T07:25:10Z app[5616ea5f] den [info]Starting init (commit: 81d5330)...
2022-11-23T07:25:10Z app[5616ea5f] den [info]Mounting /dev/vdc at /data w/ uid: 0, gid: 0 and chmod 0755
2022-11-23T07:25:10Z app[5616ea5f] den [info]Preparing to run: ` litefs mount -- node ./other/start.js` as root
2022-11-23T07:25:10Z app[5616ea5f] den [info]2022/11/23 07:25:10 listening on [fdaa:0:23df:a7b:d827:2:6f13:2]:22 (DNS: [fdaa::3]:53)
2022-11-23T07:25:10Z app[5616ea5f] den [info]config file read from /etc/litefs.yml
2022-11-23T07:25:10Z app[5616ea5f] den [info]LiteFS commit=33a80a3c5d7c3109170e7cb9cdab6ebf40b2fc3b
2022-11-23T07:25:10Z app[5616ea5f] den [info]Using Consul to determine primary
2022-11-23T07:25:12Z app[5616ea5f] den [info]initializing consul: key= url= hostname=5616ea5f advertise-url=http://5616ea5f.vm.kcd.internal:20202
2022-11-23T07:25:12Z app[5616ea5f] den [info]ERROR: cannot open store: open databases: open database("sqlite.db"): verify database file: database checksum (8ae7b26f44b30773) does not match latest LTX checksum (93bba783d75d24f5)
2022-11-23T07:25:13Z app[5616ea5f] den [info]Starting clean up.
2022-11-23T07:25:13Z app[5616ea5f] den [info]Umounting /dev/vdc from /data
2022-11-23T07:25:19Z runner[5616ea5f] den [info]Starting instance
2022-11-23T07:25:19Z runner[5616ea5f] den [info]Configuring virtual machine
2022-11-23T07:25:19Z runner[5616ea5f] den [info]Pulling container image
2022-11-23T07:25:20Z runner[5616ea5f] den [info]Unpacking image


I managed to get it back up and running by sticking exit-on-error: false in my litefs.yml and updating the location of the mount-dir from /litefs/data to /litefs/data-2. Once it was up, I SSHed into the VM, dumped the old database into the new one and everything’s working now.

Would be nice if I could reboot LiteFS without issue :slight_smile:

1 Like

@kentcdodds I was able to reproduce the issue with the journal that we saw yesterday and I’m going to get it fixed up today.