Site falls over every few hours before rebooting

@kentcdodds It looks like it’s pretty idle in that stack trace. Yeah, you’ll probably need to wait until it does it again.

Unfortunately when this happened earlier today I was away from home, but here’s the current output of http://kcd.internal:20202/debug/pprof/goroutine?debug=2 right now:

goroutine 371842 [running]:
runtime/pprof.writeGoroutineStacks({0xb6a360, 0xc00027e000})
	/usr/local/go/src/runtime/pprof/pprof.go:692 +0x70
runtime/pprof.writeGoroutine({0xb6a360?, 0xc00027e000?}, 0xc00032dad0?)
	/usr/local/go/src/runtime/pprof/pprof.go:681 +0x2b
runtime/pprof.(*Profile).WriteTo(0xa2dda0?, {0xb6a360?, 0xc00027e000?}, 0xc?)
	/usr/local/go/src/runtime/pprof/pprof.go:330 +0x14b
net/http/pprof.handler.ServeHTTP({0xc0002624f1, 0x9}, {0xb6e300, 0xc00027e000}, 0xc00041b1b1?)
	/usr/local/go/src/net/http/pprof/pprof.go:253 +0x4a5
net/http/pprof.Index({0xb6e300?, 0xc00027e000}, 0xc000131500)
	/usr/local/go/src/net/http/pprof/pprof.go:371 +0x13e
github.com/superfly/litefs/http.(*Server).serveHTTP(0xc0001632c0?, {0xb6e300, 0xc00027e000}, 0xc000131500)
	/src/litefs/http/server.go:128 +0x186
net/http.HandlerFunc.ServeHTTP(0x40?, {0xb6e300?, 0xc00027e000?}, 0x0?)
	/usr/local/go/src/net/http/server.go:2109 +0x2f
golang.org/x/net/http2/h2c.h2cHandler.ServeHTTP({{0xb6aec0?, 0xc00014ea80?}, 0xc000089bc0?}, {0xb6e300, 0xc00027e000}, 0xc000131500)
	/go/pkg/mod/golang.org/x/net@v0.0.0-20220909164309-bea034e7d591/http2/h2c/h2c.go:113 +0x49f
net/http.serverHandler.ServeHTTP({0xc00043ee70?}, {0xb6e300, 0xc00027e000}, 0xc000131500)
	/usr/local/go/src/net/http/server.go:2947 +0x30c
net/http.(*conn).serve(0xc0004181e0, {0xb6ea98, 0xc000352030})
	/usr/local/go/src/net/http/server.go:1991 +0x607
created by net/http.(*Server).Serve
	/usr/local/go/src/net/http/server.go:3102 +0x4db

goroutine 1 [select, 277 minutes]:
main.runMount({0xb6ea28, 0xc00002c080}, {0xc0000220c0, 0x3, 0x3})
	/src/litefs/cmd/litefs/main.go:103 +0x558
main.run({0xb6ea28?, 0xc00002c080?}, {0xc0000220b0?, 0xc000119f70?, 0x406719?})
	/src/litefs/cmd/litefs/main.go:47 +0x78
main.main()
	/src/litefs/cmd/litefs/main.go:30 +0x71

goroutine 23 [syscall, 277 minutes]:
os/signal.signal_recv()
	/usr/local/go/src/runtime/sigqueue.go:152 +0x2f
os/signal.loop()
	/usr/local/go/src/os/signal/signal_unix.go:23 +0x19
created by os/signal.Notify.func1.1
	/usr/local/go/src/os/signal/signal.go:151 +0x2a

goroutine 40 [select]:
github.com/superfly/litefs.(*Store).monitorLeaseAsPrimary(0xc000130500, {0xb6e9f0, 0xc000089b00}, {0xb6ed00, 0xc0003524b0})
	/src/litefs/store.go:526 +0x1f0
github.com/superfly/litefs.(*Store).monitorLease(0xc000130500, {0xb6e9f0, 0xc000089b00})
	/src/litefs/store.go:451 +0x1fd
github.com/superfly/litefs.(*Store).Open.func1()
	/src/litefs/store.go:134 +0x25
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/go/pkg/mod/golang.org/x/sync@v0.0.0-20220601150217-0de741cfad7f/errgroup/errgroup.go:75 +0x64
created by golang.org/x/sync/errgroup.(*Group).Go
	/go/pkg/mod/golang.org/x/sync@v0.0.0-20220601150217-0de741cfad7f/errgroup/errgroup.go:72 +0xa5

goroutine 38 [IO wait]:
internal/poll.runtime_pollWait(0x7fe75e9c8e08, 0x72)
	/usr/local/go/src/runtime/netpoll.go:305 +0x89
internal/poll.(*pollDesc).wait(0xc00016ac80?, 0xc0000bb300?, 0x0)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x32
internal/poll.(*pollDesc).waitRead(...)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc00016ac80, {0xc0000bb300, 0x1300, 0x1300})
	/usr/local/go/src/internal/poll/fd_unix.go:167 +0x25a
net.(*netFD).Read(0xc00016ac80, {0xc0000bb300?, 0xc00006b560?, 0x7fe75ee7ffff?})
	/usr/local/go/src/net/fd_posix.go:55 +0x29
net.(*conn).Read(0xc0000127c0, {0xc0000bb300?, 0x7fe75eca0c00?, 0xc0001c3808?})
	/usr/local/go/src/net/net.go:183 +0x45
crypto/tls.(*atLeastReader).Read(0xc0004392d8, {0xc0000bb300?, 0x0?, 0xc0001c3860?})
	/usr/local/go/src/crypto/tls/conn.go:787 +0x3d
bytes.(*Buffer).ReadFrom(0xc00014de78, {0xb696c0, 0xc0004392d8})
	/usr/local/go/src/bytes/buffer.go:202 +0x98
crypto/tls.(*Conn).readFromUntil(0xc00014dc00, {0xb6a0a0?, 0xc0000127c0}, 0xf4c540?)
	/usr/local/go/src/crypto/tls/conn.go:809 +0xe5
crypto/tls.(*Conn).readRecordOrCCS(0xc00014dc00, 0x0)
	/usr/local/go/src/crypto/tls/conn.go:616 +0x116
crypto/tls.(*Conn).readRecord(...)
	/usr/local/go/src/crypto/tls/conn.go:582
crypto/tls.(*Conn).Read(0xc00014dc00, {0xc00019a000, 0x1000, 0x771860?})
	/usr/local/go/src/crypto/tls/conn.go:1287 +0x16f
bufio.(*Reader).Read(0xc0000d8780, {0xc00027f700, 0x9, 0x78e765?})
	/usr/local/go/src/bufio/bufio.go:237 +0x1bb
io.ReadAtLeast({0xb69540, 0xc0000d8780}, {0xc00027f700, 0x9, 0x9}, 0x9)
	/usr/local/go/src/io/io.go:332 +0x9a
io.ReadFull(...)
	/usr/local/go/src/io/io.go:351
net/http.http2readFrameHeader({0xc00027f700?, 0x9?, 0xc00043ed20?}, {0xb69540?, 0xc0000d8780?})
	/usr/local/go/src/net/http/h2_bundle.go:1565 +0x6e
net/http.(*http2Framer).ReadFrame(0xc00027f6c0)
	/usr/local/go/src/net/http/h2_bundle.go:1829 +0x95
net/http.(*http2clientConnReadLoop).run(0xc0001c3f98)
	/usr/local/go/src/net/http/h2_bundle.go:8875 +0x130
net/http.(*http2ClientConn).readLoop(0xc0000aa180)
	/usr/local/go/src/net/http/h2_bundle.go:8771 +0x6f
created by net/http.(*http2Transport).newClientConn
	/usr/local/go/src/net/http/h2_bundle.go:7478 +0xaaa

goroutine 41 [select]:
github.com/superfly/litefs.(*Store).monitorRetention(0xc000130500, {0xb6e9f0, 0xc000089b00})
	/src/litefs/store.go:613 +0xe7
github.com/superfly/litefs.(*Store).Open.func2()
	/src/litefs/store.go:138 +0x25
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/go/pkg/mod/golang.org/x/sync@v0.0.0-20220601150217-0de741cfad7f/errgroup/errgroup.go:75 +0x64
created by golang.org/x/sync/errgroup.(*Group).Go
	/go/pkg/mod/golang.org/x/sync@v0.0.0-20220601150217-0de741cfad7f/errgroup/errgroup.go:72 +0xa5

goroutine 50 [select, 277 minutes]:
os/exec.(*Cmd).watchCtx.func1()
	/usr/local/go/src/os/exec/exec.go:648 +0x96
created by os/exec.(*Cmd).watchCtx
	/usr/local/go/src/os/exec/exec.go:647 +0x8e

goroutine 46 [syscall]:
syscall.Syscall(0x4780ee?, 0xf08980?, 0x40dc07?, 0x9f8a60?)
	/usr/local/go/src/syscall/syscall_linux.go:68 +0x27
syscall.read(0xf08980?, {0xc00035c000?, 0x8?, 0x0?})
	/usr/local/go/src/syscall/zsyscall_linux_amd64.go:696 +0x45
syscall.Read(...)
	/usr/local/go/src/syscall/syscall_unix.go:183
bazil.org/fuse.(*Conn).ReadRequest(0xc0000d8ea0)
	/go/pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fuse.go:580 +0xc6
bazil.org/fuse/fs.(*Server).Serve(0xc00027f7a0, {0xb699a0?, 0xc0000d8ba0})
	/go/pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fs/serve.go:501 +0x3ef
github.com/superfly/litefs/fuse.(*FileSystem).Mount.func2()
	/src/litefs/fuse/file_system.go:93 +0x28
created by github.com/superfly/litefs/fuse.(*FileSystem).Mount
	/src/litefs/fuse/file_system.go:92 +0x31c

goroutine 47 [IO wait]:
internal/poll.runtime_pollWait(0x7fe75e9c8fe8, 0x72)
	/usr/local/go/src/runtime/netpoll.go:305 +0x89
internal/poll.(*pollDesc).wait(0xc00016a600?, 0x6?, 0x0)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x32
internal/poll.(*pollDesc).waitRead(...)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc00016a600)
	/usr/local/go/src/internal/poll/fd_unix.go:614 +0x234
net.(*netFD).accept(0xc00016a600)
	/usr/local/go/src/net/fd_unix.go:172 +0x35
net.(*TCPListener).accept(0xc000011608)
	/usr/local/go/src/net/tcpsock_posix.go:142 +0x28
net.(*TCPListener).Accept(0xc000011608)
	/usr/local/go/src/net/tcpsock.go:288 +0x3d
net/http.(*Server).Serve(0xc0000fa1e0, {0xb6e0f0, 0xc000011608})
	/usr/local/go/src/net/http/server.go:3070 +0x385
github.com/superfly/litefs/http.(*Server).Serve.func1()
	/src/litefs/http/server.go:74 +0x2d
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/go/pkg/mod/golang.org/x/sync@v0.0.0-20220601150217-0de741cfad7f/errgroup/errgroup.go:75 +0x64
created by golang.org/x/sync/errgroup.(*Group).Go
	/go/pkg/mod/golang.org/x/sync@v0.0.0-20220601150217-0de741cfad7f/errgroup/errgroup.go:72 +0xa5

goroutine 51 [syscall, 277 minutes]:
syscall.Syscall6(0x0?, 0x0?, 0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/usr/local/go/src/syscall/syscall_linux.go:90 +0x36
os.(*Process).blockUntilWaitable(0xc000262180)
	/usr/local/go/src/os/wait_waitid.go:32 +0x87
os.(*Process).wait(0xc000262180)
	/usr/local/go/src/os/exec_unix.go:22 +0x28
os.(*Process).Wait(...)
	/usr/local/go/src/os/exec.go:132
os/exec.(*Cmd).Wait(0xc0001bd340)
	/usr/local/go/src/os/exec/exec.go:599 +0x4b
main.(*MountCommand).execCmd.func1()
	/src/litefs/cmd/litefs/mount_linux.go:332 +0x25
created by main.(*MountCommand).execCmd
	/src/litefs/cmd/litefs/mount_linux.go:332 +0x311

goroutine 371843 [runnable]:
net/http.(*connReader).startBackgroundRead.func2()
	/usr/local/go/src/net/http/server.go:674
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1594 +0x1
created by net/http.(*connReader).startBackgroundRead
	/usr/local/go/src/net/http/server.go:674 +0xca

I doubt that’s helpful since it was so long ago.

Interestingly, the logs show the same exact last bit of output before the reboot:

Updated the cache value for blog:rankings. Getting a fresh value for this took 550ms. Caching for 3600000ms + 86400000ms stale in SQLite cache.

That’s three times in a row.

What’s interesting is that I’m actually not using LiteFS for my SQLite cache (due to the inability for non-primary regions to write to their local SQLite file which kinda defeats the purpose of a cache).

It’s still very possible that:

  1. The cache log is just a coincidence.
  2. The cache log has nothing to do with the problem and it’s actually something else that happens after that log that’s the problem.
  3. This has nothing to do with LiteFS.

I don’t think we can rule out any of those :grimacing: I’ll try deploying more logging again. Hopefully it doesn’t cause issues like it did last time.

That is an odd coincidence. Do you know where this log line comes from? I tried doing a search of the kentcdodds.com repo but I didn’t have any luck.

Here is where that log comes from: cachified/reporter.ts at main · Xiphe/cachified · GitHub

Specifically, here’s the bit of my code that calls into that: kentcdodds.com/blog.server.ts at main · kentcdodds/kentcdodds.com · GitHub

Ok, it happened again while I was sleeping about 40 minutes ago. Here’s the stack trace:

goroutine 117610 [running]:
runtime/pprof.writeGoroutineStacks({0xb6a360, 0xc00027e000})
	/usr/local/go/src/runtime/pprof/pprof.go:692 +0x70
runtime/pprof.writeGoroutine({0xb6a360?, 0xc00027e000?}, 0xc000268a90?)
	/usr/local/go/src/runtime/pprof/pprof.go:681 +0x2b
runtime/pprof.(*Profile).WriteTo(0xa2dda0?, {0xb6a360?, 0xc00027e000?}, 0xc?)
	/usr/local/go/src/runtime/pprof/pprof.go:330 +0x14b
net/http/pprof.handler.ServeHTTP({0xc000262341, 0x9}, {0xb6e300, 0xc00027e000}, 0xc00028d197?)
	/usr/local/go/src/net/http/pprof/pprof.go:253 +0x4a5
net/http/pprof.Index({0xb6e300?, 0xc00027e000}, 0xc000130c00)
	/usr/local/go/src/net/http/pprof/pprof.go:371 +0x13e
github.com/superfly/litefs/http.(*Server).serveHTTP(0xc000161360?, {0xb6e300, 0xc00027e000}, 0xc000130c00)
	/src/litefs/http/server.go:128 +0x186
net/http.HandlerFunc.ServeHTTP(0x40?, {0xb6e300?, 0xc00027e000?}, 0x0?)
	/usr/local/go/src/net/http/server.go:2109 +0x2f
golang.org/x/net/http2/h2c.h2cHandler.ServeHTTP({{0xb6aec0?, 0xc00014ea90?}, 0xc000089bc0?}, {0xb6e300, 0xc00027e000}, 0xc000130c00)
	/go/pkg/mod/golang.org/x/net@v0.0.0-20220909164309-bea034e7d591/http2/h2c/h2c.go:113 +0x49f
net/http.serverHandler.ServeHTTP({0xc0004746f0?}, {0xb6e300, 0xc00027e000}, 0xc000130c00)
	/usr/local/go/src/net/http/server.go:2947 +0x30c
net/http.(*conn).serve(0xc000277ae0, {0xb6ea98, 0xc000354030})
	/usr/local/go/src/net/http/server.go:1991 +0x607
created by net/http.(*Server).Serve
	/usr/local/go/src/net/http/server.go:3102 +0x4db

goroutine 1 [select, 48 minutes]:
main.runMount({0xb6ea28, 0xc00002c080}, {0xc0000220c0, 0x3, 0x3})
	/src/litefs/cmd/litefs/main.go:103 +0x558
main.run({0xb6ea28?, 0xc00002c080?}, {0xc0000220b0?, 0xc000119f70?, 0x406719?})
	/src/litefs/cmd/litefs/main.go:47 +0x78
main.main()
	/src/litefs/cmd/litefs/main.go:30 +0x71

goroutine 23 [syscall, 48 minutes]:
os/signal.signal_recv()
	/usr/local/go/src/runtime/sigqueue.go:152 +0x2f
os/signal.loop()
	/usr/local/go/src/os/signal/signal_unix.go:23 +0x19
created by os/signal.Notify.func1.1
	/usr/local/go/src/os/signal/signal.go:151 +0x2a

goroutine 38 [select]:
github.com/superfly/litefs.(*Store).monitorLeaseAsPrimary(0xc000130500, {0xb6e9f0, 0xc000089b00}, {0xb6ed00, 0xc0003544b0})
	/src/litefs/store.go:526 +0x1f0
github.com/superfly/litefs.(*Store).monitorLease(0xc000130500, {0xb6e9f0, 0xc000089b00})
	/src/litefs/store.go:451 +0x1fd
github.com/superfly/litefs.(*Store).Open.func1()
	/src/litefs/store.go:134 +0x25
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/go/pkg/mod/golang.org/x/sync@v0.0.0-20220601150217-0de741cfad7f/errgroup/errgroup.go:75 +0x64
created by golang.org/x/sync/errgroup.(*Group).Go
	/go/pkg/mod/golang.org/x/sync@v0.0.0-20220601150217-0de741cfad7f/errgroup/errgroup.go:72 +0xa5

goroutine 39 [select]:
github.com/superfly/litefs.(*Store).monitorRetention(0xc000130500, {0xb6e9f0, 0xc000089b00})
	/src/litefs/store.go:613 +0xe7
github.com/superfly/litefs.(*Store).Open.func2()
	/src/litefs/store.go:138 +0x25
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/go/pkg/mod/golang.org/x/sync@v0.0.0-20220601150217-0de741cfad7f/errgroup/errgroup.go:75 +0x64
created by golang.org/x/sync/errgroup.(*Group).Go
	/go/pkg/mod/golang.org/x/sync@v0.0.0-20220601150217-0de741cfad7f/errgroup/errgroup.go:72 +0xa5

goroutine 36 [IO wait]:
internal/poll.runtime_pollWait(0x7f2962cb8ef8, 0x72)
	/usr/local/go/src/runtime/netpoll.go:305 +0x89
internal/poll.(*pollDesc).wait(0xc000168c00?, 0xc0000bb300?, 0x0)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x32
internal/poll.(*pollDesc).waitRead(...)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc000168c00, {0xc0000bb300, 0x1300, 0x1300})
	/usr/local/go/src/internal/poll/fd_unix.go:167 +0x25a
net.(*netFD).Read(0xc000168c00, {0xc0000bb300?, 0xc00006b560?, 0xc0000bb305?})
	/usr/local/go/src/net/fd_posix.go:55 +0x29
net.(*conn).Read(0xc0000127c8, {0xc0000bb300?, 0x1ffffffffffffff?, 0x39?})
	/usr/local/go/src/net/net.go:183 +0x45
crypto/tls.(*atLeastReader).Read(0xc000201f80, {0xc0000bb300?, 0x0?, 0xc0001c3860?})
	/usr/local/go/src/crypto/tls/conn.go:787 +0x3d
bytes.(*Buffer).ReadFrom(0xc00014de78, {0xb696c0, 0xc000201f80})
	/usr/local/go/src/bytes/buffer.go:202 +0x98
crypto/tls.(*Conn).readFromUntil(0xc00014dc00, {0xb6a0a0?, 0xc0000127c8}, 0x203000?)
	/usr/local/go/src/crypto/tls/conn.go:809 +0xe5
crypto/tls.(*Conn).readRecordOrCCS(0xc00014dc00, 0x0)
	/usr/local/go/src/crypto/tls/conn.go:616 +0x116
crypto/tls.(*Conn).readRecord(...)
	/usr/local/go/src/crypto/tls/conn.go:582
crypto/tls.(*Conn).Read(0xc00014dc00, {0xc0001a0000, 0x1000, 0x771860?})
	/usr/local/go/src/crypto/tls/conn.go:1287 +0x16f
bufio.(*Reader).Read(0xc0001966c0, {0xc00027f700, 0x9, 0x78e765?})
	/usr/local/go/src/bufio/bufio.go:237 +0x1bb
io.ReadAtLeast({0xb69540, 0xc0001966c0}, {0xc00027f700, 0x9, 0x9}, 0x9)
	/usr/local/go/src/io/io.go:332 +0x9a
io.ReadFull(...)
	/usr/local/go/src/io/io.go:351
net/http.http2readFrameHeader({0xc00027f700?, 0x9?, 0xc000474480?}, {0xb69540?, 0xc0001966c0?})
	/usr/local/go/src/net/http/h2_bundle.go:1565 +0x6e
net/http.(*http2Framer).ReadFrame(0xc00027f6c0)
	/usr/local/go/src/net/http/h2_bundle.go:1829 +0x95
net/http.(*http2clientConnReadLoop).run(0xc0001c3f98)
	/usr/local/go/src/net/http/h2_bundle.go:8875 +0x130
net/http.(*http2ClientConn).readLoop(0xc0000aa180)
	/usr/local/go/src/net/http/h2_bundle.go:8771 +0x6f
created by net/http.(*http2Transport).newClientConn
	/usr/local/go/src/net/http/h2_bundle.go:7478 +0xaaa

goroutine 48 [select, 48 minutes]:
os/exec.(*Cmd).watchCtx.func1()
	/usr/local/go/src/os/exec/exec.go:648 +0x96
created by os/exec.(*Cmd).watchCtx
	/usr/local/go/src/os/exec/exec.go:647 +0x8e

goroutine 44 [syscall]:
syscall.Syscall(0x4780ee?, 0xf08980?, 0x40dc07?, 0x9f8a60?)
	/usr/local/go/src/syscall/syscall_linux.go:68 +0x27
syscall.read(0xf08980?, {0xc00050c000?, 0xd?, 0x0?})
	/usr/local/go/src/syscall/zsyscall_linux_amd64.go:696 +0x45
syscall.Read(...)
	/usr/local/go/src/syscall/syscall_unix.go:183
bazil.org/fuse.(*Conn).ReadRequest(0xc000196de0)
	/go/pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fuse.go:580 +0xc6
bazil.org/fuse/fs.(*Server).Serve(0xc00027f7a0, {0xb699a0?, 0xc000196ae0})
	/go/pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fs/serve.go:501 +0x3ef
github.com/superfly/litefs/fuse.(*FileSystem).Mount.func2()
	/src/litefs/fuse/file_system.go:93 +0x28
created by github.com/superfly/litefs/fuse.(*FileSystem).Mount
	/src/litefs/fuse/file_system.go:92 +0x31c

goroutine 45 [IO wait]:
internal/poll.runtime_pollWait(0x7f2962cb8fe8, 0x72)
	/usr/local/go/src/runtime/netpoll.go:305 +0x89
internal/poll.(*pollDesc).wait(0xc000168600?, 0x6?, 0x0)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x32
internal/poll.(*pollDesc).waitRead(...)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc000168600)
	/usr/local/go/src/internal/poll/fd_unix.go:614 +0x234
net.(*netFD).accept(0xc000168600)
	/usr/local/go/src/net/fd_unix.go:172 +0x35
net.(*TCPListener).accept(0xc000011620)
	/usr/local/go/src/net/tcpsock_posix.go:142 +0x28
net.(*TCPListener).Accept(0xc000011620)
	/usr/local/go/src/net/tcpsock.go:288 +0x3d
net/http.(*Server).Serve(0xc0000fa1e0, {0xb6e0f0, 0xc000011620})
	/usr/local/go/src/net/http/server.go:3070 +0x385
github.com/superfly/litefs/http.(*Server).Serve.func1()
	/src/litefs/http/server.go:74 +0x2d
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/go/pkg/mod/golang.org/x/sync@v0.0.0-20220601150217-0de741cfad7f/errgroup/errgroup.go:75 +0x64
created by golang.org/x/sync/errgroup.(*Group).Go
	/go/pkg/mod/golang.org/x/sync@v0.0.0-20220601150217-0de741cfad7f/errgroup/errgroup.go:72 +0xa5

goroutine 49 [syscall, 48 minutes]:
syscall.Syscall6(0x0?, 0x0?, 0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/usr/local/go/src/syscall/syscall_linux.go:90 +0x36
os.(*Process).blockUntilWaitable(0xc000262180)
	/usr/local/go/src/os/wait_waitid.go:32 +0x87
os.(*Process).wait(0xc000262180)
	/usr/local/go/src/os/exec_unix.go:22 +0x28
os.(*Process).Wait(...)
	/usr/local/go/src/os/exec.go:132
os/exec.(*Cmd).Wait(0xc0001911e0)
	/usr/local/go/src/os/exec/exec.go:599 +0x4b
main.(*MountCommand).execCmd.func1()
	/src/litefs/cmd/litefs/mount_linux.go:332 +0x25
created by main.(*MountCommand).execCmd
	/src/litefs/cmd/litefs/mount_linux.go:332 +0x311

goroutine 117611 [runnable]:
net/http.(*connReader).startBackgroundRead.func2()
	/usr/local/go/src/net/http/server.go:674
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1594 +0x1
created by net/http.(*connReader).startBackgroundRead
	/usr/local/go/src/net/http/server.go:674 +0xca

I’m guessing I missed it again.

This time, I changed the logs a bit. Normally, my request logs don’t actually log until we’ve sent the response. I changed it so they log as soon as we receive the request. So now I think I know what it’s doing when this happens:

2022-11-28T12:36:52Z app[ecf73ece] den [info] GET /blog/rss.xml - - - - ms
2022-11-28T12:36:52Z app[ecf73ece] den [info] POST /__metronome - - - - ms
2022-11-28T12:36:53Z app[ecf73ece] den [info] POST /__metronome - - - - ms
2022-11-28T12:36:58Z app[ecf73ece] den [info] GET /__metronome/metronome-5.6.1.js - - - - ms
2022-11-28T12:36:59Z app[ecf73ece] den [info] POST /__metronome - - - - ms
2022-11-28T12:36:59Z app[ecf73ece] den [info] POST /blog/useeffect-vs-uselayouteffect?_data=routes%2Fblog.%24slug - - - - ms
2022-11-28T12:36:59Z app[ecf73ece] den [info] POST /blog/get-a-catch-block-error-message-with-typescript?_data=routes%2Fblog.%24slug - - - - ms
2022-11-28T12:37:00Z app[ecf73ece] den [info] Updated the cache value for blog:useeffect-vs-uselayouteffect:rankings. Getting a fresh value for this took 502ms. Caching for 604800000ms + 86400000ms stale in SQLite cache.
2022-11-28T12:37:00Z app[ecf73ece] den [info] Updated the cache value for blog:rankings. Getting a fresh value for this took 546ms. Caching for 3600000ms + 86400000ms stale in SQLite cache.
2022-11-28T12:43:09Z runner[ecf73ece] den [info] Shutting down virtual machine
2022-11-28T12:43:09Z app[ecf73ece] den [info] Sending signal SIGINT to main child process w/ PID 529
2022-11-28T12:43:09Z app[ecf73ece] den [info] sending signal to exec process
2022-11-28T12:43:09Z app[ecf73ece] den [info] waiting for exec process to close
2022-11-28T12:43:09Z app[ecf73ece] den [info] signal received, litefs shutting down
2022-11-28T12:43:09Z app[ecf73ece] den [info] exiting primary, destroying lease
2022-11-28T12:43:09Z app[ecf73ece] den [info] exit status 1: fusermount: failed to unmount /litefs/data-2: Device or resource busy
2022-11-28T12:43:22Z runner[ecf73ece] den [info] Starting instance
2022-11-28T12:43:22Z runner[ecf73ece] den [info] Configuring virtual machine
2022-11-28T12:43:22Z runner[ecf73ece] den [info] Pulling container image
2022-11-28T12:43:22Z runner[ecf73ece] den [info] Unpacking image
2022-11-28T12:43:22Z runner[ecf73ece] den [info] Preparing kernel init
2022-11-28T12:43:22Z runner[ecf73ece] den [info] Setting up volume 'data'
2022-11-28T12:43:22Z runner[ecf73ece] den [info] Opening encrypted volume
2022-11-28T12:43:23Z runner[ecf73ece] den [info] Configuring firecracker
2022-11-28T12:43:23Z runner[ecf73ece] den [info] Starting virtual machine
2022-11-28T12:43:23Z app[ecf73ece] den [info] Starting init (commit: 81d5330)...
2022-11-28T12:43:23Z app[ecf73ece] den [info] Mounting /dev/vdc at /data w/ uid: 0, gid: 0 and chmod 0755
2022-11-28T12:43:23Z app[ecf73ece] den [info] Preparing to run: `docker-entrypoint.sh litefs mount -- node ./other/start.js` as root
2022-11-28T12:43:23Z app[ecf73ece] den [info] 2022/11/28 12:43:23 listening on [fdaa:0:23df:a7b:d827:2:6f13:2]:22 (DNS: [fdaa::3]:53)
2022-11-28T12:43:23Z app[ecf73ece] den [info] config file read from /etc/litefs.yml
2022-11-28T12:43:23Z app[ecf73ece] den [info] LiteFS v0.3.0-beta5, commit=9bb4a2d406f69bdb513c4fd91a5986d62dc08612
2022-11-28T12:43:23Z app[ecf73ece] den [info] Using Consul to determine primary
2022-11-28T12:43:25Z app[ecf73ece] den [info] initializing consul: key= url=https://:b832a863-60c2-48d4-8289-bdc5d80fc444@consul-iad.fly-shared.net/kcd-g3zmqx5x3y49dlp4/ hostname=ecf73ece advertise-url=http://ecf73ece.vm.kcd.internal:20202
2022-11-28T12:43:26Z app[ecf73ece] den [info] LiteFS mounted to: /litefs/data-2
2022-11-28T12:43:26Z app[ecf73ece] den [info] http server listening on: http://localhost:20202
2022-11-28T12:43:26Z app[ecf73ece] den [info] waiting to connect to cluster
2022-11-28T12:43:26Z app[ecf73ece] den [info] primary lease acquired, advertising as http://ecf73ece.vm.kcd.internal:20202
2022-11-28T12:43:26Z app[ecf73ece] den [info] connected to cluster, ready
2022-11-28T12:43:26Z app[ecf73ece] den [info] starting subprocess: node [./other/start.js]
2022-11-28T12:43:26Z app[ecf73ece] den [info] waiting for signal or subprocess to exit
2022-11-28T12:43:26Z app[ecf73ece] den [info] No .primary file found.
2022-11-28T12:43:26Z app[ecf73ece] den [info] Using current instance (ecf73ece) as primary (in den)
2022-11-28T12:43:26Z app[ecf73ece] den [info] Instance (ecf73ece) in den is primary. Deploying migrations.
2022-11-28T12:43:28Z app[ecf73ece] den [info] Prisma schema loaded from prisma/schema.prisma
2022-11-28T12:43:28Z app[ecf73ece] den [info] Datasource "db": SQLite database "sqlite.db" at "file:/litefs/data-2/sqlite.db"
2022-11-28T12:43:28Z app[ecf73ece] den [info] 2 migrations found in prisma/migrations
2022-11-28T12:43:28Z app[ecf73ece] den [info] No pending migrations to apply.
2022-11-28T12:43:28Z app[ecf73ece] den [info] npm notice
2022-11-28T12:43:28Z app[ecf73ece] den [info] npm notice New major version of npm available! 8.19.2 -> 9.1.2
2022-11-28T12:43:28Z app[ecf73ece] den [info] npm notice Changelog: <https://github.com/npm/cli/releases/tag/v9.1.2>
2022-11-28T12:43:28Z app[ecf73ece] den [info] npm notice Run `npm install -g npm@9.1.2` to update!
2022-11-28T12:43:28Z app[ecf73ece] den [info] npm notice
2022-11-28T12:43:28Z app[ecf73ece] den [info] Starting app...
2022-11-28T12:43:28Z app[ecf73ece] den [info] > kentcdodds.com@1.0.0 start
2022-11-28T12:43:28Z app[ecf73ece] den [info] > cross-env NODE_ENV=production node --require ./node_modules/dotenv/config ./index.js
2022-11-28T12:43:31Z app[ecf73ece] den [info] Express server listening on port 8080
2022-11-28T12:43:31Z app[ecf73ece] den [info] prisma:info Starting a sqlite pool with 3 connections.
2022-11-28T12:43:34Z app[ecf73ece] den [info] GET /blog/rss.xml - - - - ms
2022-11-28T12:43:35Z app[ecf73ece] den [info] GET /calls/01/19/where-did-those-diagrams-come-from - - - - ms
2022-11-28T12:43:35Z app[ecf73ece] den [info] GET /blog/rss.xml - - - - ms

It looks like this is happening when a user finishes reading a blog post and we mark the post as read for them and then trigger the total reads to be updated.

So now all we can be sure of is that the VM finishes executing to at least here before things spin out of control: kentcdodds.com/blog.$slug.tsx at 5a35bf204e6597b34848b2cdb224c690e99615a7 · kentcdodds/kentcdodds.com · GitHub

I do know that we get past that code sometimes because those next lines of code are responsible for these messages sent by the discord bot which are successfully happening most of the time:

So, I’m going to change some of that code a tiny bit (we can have those messages be a fire-and-forget so the request isn’t waiting) and I’m going to add a bunch of logging to that spot to see whether we make it past that code before the CPU freaks out.

I just noticed that my site actually fell over twice last night. Here are the logs for that. Again, looks like it happens right after updating the read rankings after a user reads a post:

2022-11-28T11:09:00Z app[ecf73ece] den [info] GET /blog/rss.xml - - - - ms
2022-11-28T11:09:02Z app[ecf73ece] den [info] POST /blog/aha-programming?_data=routes%2Fblog.%24slug - - - - ms
2022-11-28T11:09:03Z app[ecf73ece] den [info] POST /blog/the-testing-trophy-and-testing-classifications?_data=routes%2Fblog.%24slug - - - - ms
2022-11-28T11:09:03Z app[ecf73ece] den [info] Updated the cache value for blog:aha-programming:rankings. Getting a fresh value for this took 480ms. Caching for 604800000ms + 86400000ms stale in SQLite cache.
2022-11-28T11:09:03Z app[ecf73ece] den [info] Updated the cache value for blog:rankings. Getting a fresh value for this took 528ms. Caching for 3600000ms + 86400000ms stale in SQLite cache.
2022-11-28T11:16:07Z runner[ecf73ece] den [info] Shutting down virtual machine
2022-11-28T11:16:07Z app[ecf73ece] den [info] Sending signal SIGINT to main child process w/ PID 529
2022-11-28T11:16:07Z app[ecf73ece] den [info] sending signal to exec process
2022-11-28T11:16:07Z app[ecf73ece] den [info] waiting for exec process to close
2022-11-28T11:16:07Z app[ecf73ece] den [info] signal received, litefs shutting down
2022-11-28T11:16:07Z app[ecf73ece] den [info] exiting primary, destroying lease
2022-11-28T11:16:07Z app[ecf73ece] den [info] exit status 1: fusermount: failed to unmount /litefs/data-2: Device or resource busy
2022-11-28T11:16:19Z runner[ecf73ece] den [info] Starting instance
2022-11-28T11:16:19Z runner[ecf73ece] den [info] Configuring virtual machine
2022-11-28T11:16:19Z runner[ecf73ece] den [info] Pulling container image
2022-11-28T11:16:20Z runner[ecf73ece] den [info] Unpacking image
2022-11-28T11:16:20Z runner[ecf73ece] den [info] Preparing kernel init
2022-11-28T11:16:20Z runner[ecf73ece] den [info] Setting up volume 'data'
2022-11-28T11:16:20Z runner[ecf73ece] den [info] Opening encrypted volume
2022-11-28T11:16:20Z runner[ecf73ece] den [info] Configuring firecracker
2022-11-28T11:16:20Z runner[ecf73ece] den [info] Starting virtual machine
2022-11-28T11:16:20Z app[ecf73ece] den [info] Starting init (commit: 81d5330)...
2022-11-28T11:16:20Z app[ecf73ece] den [info] Mounting /dev/vdc at /data w/ uid: 0, gid: 0 and chmod 0755
2022-11-28T11:16:21Z app[ecf73ece] den [info] Preparing to run: `docker-entrypoint.sh litefs mount -- node ./other/start.js` as root
2022-11-28T11:16:21Z app[ecf73ece] den [info] 2022/11/28 11:16:21 listening on [fdaa:0:23df:a7b:d827:2:6f13:2]:22 (DNS: [fdaa::3]:53)
2022-11-28T11:16:21Z app[ecf73ece] den [info] config file read from /etc/litefs.yml
2022-11-28T11:16:21Z app[ecf73ece] den [info] LiteFS v0.3.0-beta5, commit=9bb4a2d406f69bdb513c4fd91a5986d62dc08612
2022-11-28T11:16:21Z app[ecf73ece] den [info] Using Consul to determine primary
2022-11-28T11:16:22Z app[ecf73ece] den [info] initializing consul: key= url=https://:b832a863-60c2-48d4-8289-bdc5d80fc444@consul-iad.fly-shared.net/kcd-g3zmqx5x3y49dlp4/ hostname=ecf73ece advertise-url=http://ecf73ece.vm.kcd.internal:20202
2022-11-28T11:16:22Z app[ecf73ece] den [info] LiteFS mounted to: /litefs/data-2
2022-11-28T11:16:22Z app[ecf73ece] den [info] http server listening on: http://localhost:20202
2022-11-28T11:16:22Z app[ecf73ece] den [info] waiting to connect to cluster
2022-11-28T11:16:23Z app[ecf73ece] den [info] primary lease acquired, advertising as http://ecf73ece.vm.kcd.internal:20202
2022-11-28T11:16:23Z app[ecf73ece] den [info] connected to cluster, ready
2022-11-28T11:16:23Z app[ecf73ece] den [info] starting subprocess: node [./other/start.js]
2022-11-28T11:16:23Z app[ecf73ece] den [info] waiting for signal or subprocess to exit
2022-11-28T11:16:23Z app[ecf73ece] den [info] No .primary file found.
2022-11-28T11:16:23Z app[ecf73ece] den [info] Using current instance (ecf73ece) as primary (in den)
2022-11-28T11:16:23Z app[ecf73ece] den [info] Instance (ecf73ece) in den is primary. Deploying migrations.
2022-11-28T11:16:24Z app[ecf73ece] den [info] Prisma schema loaded from prisma/schema.prisma
2022-11-28T11:16:24Z app[ecf73ece] den [info] Datasource "db": SQLite database "sqlite.db" at "file:/litefs/data-2/sqlite.db"
2022-11-28T11:16:25Z app[ecf73ece] den [info] 2 migrations found in prisma/migrations
2022-11-28T11:16:25Z app[ecf73ece] den [info] No pending migrations to apply.
2022-11-28T11:16:25Z app[ecf73ece] den [info] npm notice
2022-11-28T11:16:25Z app[ecf73ece] den [info] npm notice New major version of npm available! 8.19.2 -> 9.1.2
2022-11-28T11:16:25Z app[ecf73ece] den [info] npm notice Changelog: <https://github.com/npm/cli/releases/tag/v9.1.2>
2022-11-28T11:16:25Z app[ecf73ece] den [info] npm notice Run `npm install -g npm@9.1.2` to update!
2022-11-28T11:16:25Z app[ecf73ece] den [info] npm notice
2022-11-28T11:16:25Z app[ecf73ece] den [info] Starting app...
2022-11-28T11:16:25Z app[ecf73ece] den [info] > kentcdodds.com@1.0.0 start
2022-11-28T11:16:25Z app[ecf73ece] den [info] > cross-env NODE_ENV=production node --require ./node_modules/dotenv/config ./index.js
2022-11-28T11:16:28Z app[ecf73ece] den [info] Express server listening on port 8080
2022-11-28T11:16:28Z app[ecf73ece] den [info] prisma:info Starting a sqlite pool with 3 connections.
2022-11-28T11:16:29Z app[ecf73ece] den [info] GET /healthcheck - - - - ms
2022-11-28T11:16:29Z app[ecf73ece] den [info] HEAD / - - - - ms
2022-11-28T11:16:29Z app[ecf73ece] den [info] Updated the cache value for total-post-reads:__all-posts__. Getting a fresh value for this took 272ms. Caching for 60000ms + 86400000ms stale in LRU.
2022-11-28T11:16:30Z app[ecf73ece] den [info] Updated the cache value for total-reader-count. Getting a fresh value for this took 1048ms. Caching for 300000ms + 86400000ms stale in LRU.
2022-11-28T11:16:30Z app[ecf73ece] den [info] prisma:query - 765ms -
2022-11-28T11:16:30Z app[ecf73ece] den [info]       SELECT
2022-11-28T11:16:30Z app[ecf73ece] den [info]         (SELECT COUNT(DISTINCT "userId") FROM "PostRead" WHERE "userId" IS NOT NULL) +
2022-11-28T11:16:30Z app[ecf73ece] den [info]         (SELECT COUNT(DISTINCT "clientId") FROM "PostRead" WHERE "clientId" IS NOT NULL)
2022-11-28T11:16:31Z app[ecf73ece] den [info] prisma:query - 1324ms - SELECT COUNT(*), `main`.`PostRead`.`postSlug` FROM `main`.`PostRead` WHERE 1=1 GROUP BY `main`.`PostRead`.`postSlug` ORDER BY COUNT(`main`.`PostRead`.`postSlug`) DESC LIMIT ? OFFSET ?
2022-11-28T11:16:31Z app[ecf73ece] den [info] Updated the cache value for sorted-most-popular-post-slugs. Getting a fresh value for this took 1332ms. Caching for 1800000ms + 86400000ms stale in LRU.
2022-11-28T11:16:32Z app[ecf73ece] den [info] HEAD /blog/rss.xml - - - - ms
2022-11-28T11:16:32Z app[ecf73ece] den [info] HEAD /blog/rss.xml - - - - ms

What’s interesting is after the node server started back up, I started getting a bunch of logs like this:

e[2m2022-11-28T11:16:44Ze[0m app[ecf73ece] e[32mdene[0m [e[34minfoe[0m]Background refresh for tito:scheduled-events successful. Getting a fresh value for this took 9265ms. Caching for 1440000ms + 2592000000ms stale in SQLite cache.

Like a ton of those for some weird reason. All with a different total time, so it’s not the same call of getFreshValue, these were individual calls which I believe cachified is supposed to make impossible.

I also got a lot of these:

e[2m2022-11-28T11:16:44Ze[0m proxy[ecf73ece] e[32mdene[0m [e[33mwarne[0m]Instance reached connections hard limit of 200

And a bunch of these:

e[2m2022-11-28T11:16:44Ze[0m app[ecf73ece] e[32mdene[0m [e[34minfoe[0m]Metronome: Metric data was not sent to metronome after 300 milliseconds timeout

Maybe these are all nothing, but they are pretty odd.

Oh, one other thing that may be interesting to note is that I’m using better-sqlite3 for my SQLite cache and it runs synchronously, so perhaps something in that is causing issues?

Site fell over again just now. I just missed it. Here’s the stack trace:

goroutine 69626 [running]:
runtime/pprof.writeGoroutineStacks({0xb6a360, 0xc0002ba0e0})
	/usr/local/go/src/runtime/pprof/pprof.go:692 +0x70
runtime/pprof.writeGoroutine({0xb6a360?, 0xc0002ba0e0?}, 0xc0003692b0?)
	/usr/local/go/src/runtime/pprof/pprof.go:681 +0x2b
runtime/pprof.(*Profile).WriteTo(0xa2dda0?, {0xb6a360?, 0xc0002ba0e0?}, 0xc?)
	/usr/local/go/src/runtime/pprof/pprof.go:330 +0x14b
net/http/pprof.handler.ServeHTTP({0xc000264521, 0x9}, {0xb6e300, 0xc0002ba0e0}, 0xc0003c2197?)
	/usr/local/go/src/net/http/pprof/pprof.go:253 +0x4a5
net/http/pprof.Index({0xb6e300?, 0xc0002ba0e0}, 0xc000131100)
	/usr/local/go/src/net/http/pprof/pprof.go:371 +0x13e
github.com/superfly/litefs/http.(*Server).serveHTTP(0xc000165360?, {0xb6e300, 0xc0002ba0e0}, 0xc000131100)
	/src/litefs/http/server.go:128 +0x186
net/http.HandlerFunc.ServeHTTP(0x40?, {0xb6e300?, 0xc0002ba0e0?}, 0x0?)
	/usr/local/go/src/net/http/server.go:2109 +0x2f
golang.org/x/net/http2/h2c.h2cHandler.ServeHTTP({{0xb6aec0?, 0xc000152a80?}, 0xc000089bc0?}, {0xb6e300, 0xc0002ba0e0}, 0xc000131100)
	/go/pkg/mod/golang.org/x/net@v0.0.0-20220909164309-bea034e7d591/http2/h2c/h2c.go:113 +0x49f
net/http.serverHandler.ServeHTTP({0xc00035d3e0?}, {0xb6e300, 0xc0002ba0e0}, 0xc000131100)
	/usr/local/go/src/net/http/server.go:2947 +0x30c
net/http.(*conn).serve(0xc0003c06e0, {0xb6ea98, 0xc00035c150})
	/usr/local/go/src/net/http/server.go:1991 +0x607
created by net/http.(*Server).Serve
	/usr/local/go/src/net/http/server.go:3102 +0x4db

goroutine 1 [select, 21 minutes]:
main.runMount({0xb6ea28, 0xc00002c080}, {0xc0000220c0, 0x3, 0x3})
	/src/litefs/cmd/litefs/main.go:103 +0x558
main.run({0xb6ea28?, 0xc00002c080?}, {0xc0000220b0?, 0xc000119f70?, 0x406719?})
	/src/litefs/cmd/litefs/main.go:47 +0x78
main.main()
	/src/litefs/cmd/litefs/main.go:30 +0x71

goroutine 23 [syscall, 21 minutes]:
os/signal.signal_recv()
	/usr/local/go/src/runtime/sigqueue.go:152 +0x2f
os/signal.loop()
	/usr/local/go/src/os/signal/signal_unix.go:23 +0x19
created by os/signal.Notify.func1.1
	/usr/local/go/src/os/signal/signal.go:151 +0x2a

goroutine 38 [select]:
github.com/superfly/litefs.(*Store).monitorLeaseAsPrimary(0xc000130500, {0xb6e9f0, 0xc000089b00}, {0xb6ed00, 0xc00035c5d0})
	/src/litefs/store.go:526 +0x1f0
github.com/superfly/litefs.(*Store).monitorLease(0xc000130500, {0xb6e9f0, 0xc000089b00})
	/src/litefs/store.go:451 +0x1fd
github.com/superfly/litefs.(*Store).Open.func1()
	/src/litefs/store.go:134 +0x25
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/go/pkg/mod/golang.org/x/sync@v0.0.0-20220601150217-0de741cfad7f/errgroup/errgroup.go:75 +0x64
created by golang.org/x/sync/errgroup.(*Group).Go
	/go/pkg/mod/golang.org/x/sync@v0.0.0-20220601150217-0de741cfad7f/errgroup/errgroup.go:72 +0xa5

goroutine 39 [select]:
github.com/superfly/litefs.(*Store).monitorRetention(0xc000130500, {0xb6e9f0, 0xc000089b00})
	/src/litefs/store.go:613 +0xe7
github.com/superfly/litefs.(*Store).Open.func2()
	/src/litefs/store.go:138 +0x25
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/go/pkg/mod/golang.org/x/sync@v0.0.0-20220601150217-0de741cfad7f/errgroup/errgroup.go:75 +0x64
created by golang.org/x/sync/errgroup.(*Group).Go
	/go/pkg/mod/golang.org/x/sync@v0.0.0-20220601150217-0de741cfad7f/errgroup/errgroup.go:72 +0xa5

goroutine 36 [IO wait]:
internal/poll.runtime_pollWait(0x7f5e7e4fbef8, 0x72)
	/usr/local/go/src/runtime/netpoll.go:305 +0x89
internal/poll.(*pollDesc).wait(0xc00016cc00?, 0xc0000bb300?, 0x0)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x32
internal/poll.(*pollDesc).waitRead(...)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc00016cc00, {0xc0000bb300, 0x1300, 0x1300})
	/usr/local/go/src/internal/poll/fd_unix.go:167 +0x25a
net.(*netFD).Read(0xc00016cc00, {0xc0000bb300?, 0xc00006b620?, 0xc0000bb305?})
	/usr/local/go/src/net/fd_posix.go:55 +0x29
net.(*conn).Read(0xc0000127c0, {0xc0000bb300?, 0x1ffffffffffffff?, 0x39?})
	/usr/local/go/src/net/net.go:183 +0x45
crypto/tls.(*atLeastReader).Read(0xc0003a84c8, {0xc0000bb300?, 0x0?, 0xc0001c3860?})
	/usr/local/go/src/crypto/tls/conn.go:787 +0x3d
bytes.(*Buffer).ReadFrom(0xc00014fe78, {0xb696c0, 0xc0003a84c8})
	/usr/local/go/src/bytes/buffer.go:202 +0x98
crypto/tls.(*Conn).readFromUntil(0xc00014fc00, {0xb6a0a0?, 0xc0000127c0}, 0x203000?)
	/usr/local/go/src/crypto/tls/conn.go:809 +0xe5
crypto/tls.(*Conn).readRecordOrCCS(0xc00014fc00, 0x0)
	/usr/local/go/src/crypto/tls/conn.go:616 +0x116
crypto/tls.(*Conn).readRecord(...)
	/usr/local/go/src/crypto/tls/conn.go:582
crypto/tls.(*Conn).Read(0xc00014fc00, {0xc000100000, 0x1000, 0x771860?})
	/usr/local/go/src/crypto/tls/conn.go:1287 +0x16f
bufio.(*Reader).Read(0xc0000d8780, {0xc0002bb700, 0x9, 0x78e765?})
	/usr/local/go/src/bufio/bufio.go:237 +0x1bb
io.ReadAtLeast({0xb69540, 0xc0000d8780}, {0xc0002bb700, 0x9, 0x9}, 0x9)
	/usr/local/go/src/io/io.go:332 +0x9a
io.ReadFull(...)
	/usr/local/go/src/io/io.go:351
net/http.http2readFrameHeader({0xc0002bb700?, 0x9?, 0xc00035d290?}, {0xb69540?, 0xc0000d8780?})
	/usr/local/go/src/net/http/h2_bundle.go:1565 +0x6e
net/http.(*http2Framer).ReadFrame(0xc0002bb6c0)
	/usr/local/go/src/net/http/h2_bundle.go:1829 +0x95
net/http.(*http2clientConnReadLoop).run(0xc0001c3f98)
	/usr/local/go/src/net/http/h2_bundle.go:8875 +0x130
net/http.(*http2ClientConn).readLoop(0xc0000aa180)
	/usr/local/go/src/net/http/h2_bundle.go:8771 +0x6f
created by net/http.(*http2Transport).newClientConn
	/usr/local/go/src/net/http/h2_bundle.go:7478 +0xaaa

goroutine 48 [select, 21 minutes]:
os/exec.(*Cmd).watchCtx.func1()
	/usr/local/go/src/os/exec/exec.go:648 +0x96
created by os/exec.(*Cmd).watchCtx
	/usr/local/go/src/os/exec/exec.go:647 +0x8e

goroutine 44 [syscall]:
syscall.Syscall(0x4780ee?, 0xf08980?, 0x40dc07?, 0x9f8a60?)
	/usr/local/go/src/syscall/syscall_linux.go:68 +0x27
syscall.read(0xf08980?, {0xc00040a000?, 0x8?, 0x0?})
	/usr/local/go/src/syscall/zsyscall_linux_amd64.go:696 +0x45
syscall.Read(...)
	/usr/local/go/src/syscall/syscall_unix.go:183
bazil.org/fuse.(*Conn).ReadRequest(0xc0000d8ea0)
	/go/pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fuse.go:580 +0xc6
bazil.org/fuse/fs.(*Server).Serve(0xc0002bb7a0, {0xb699a0?, 0xc0000d8ba0})
	/go/pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fs/serve.go:501 +0x3ef
github.com/superfly/litefs/fuse.(*FileSystem).Mount.func2()
	/src/litefs/fuse/file_system.go:93 +0x28
created by github.com/superfly/litefs/fuse.(*FileSystem).Mount
	/src/litefs/fuse/file_system.go:92 +0x31c

goroutine 45 [IO wait]:
internal/poll.runtime_pollWait(0x7f5e7e4fbfe8, 0x72)
	/usr/local/go/src/runtime/netpoll.go:305 +0x89
internal/poll.(*pollDesc).wait(0xc00016c600?, 0x6?, 0x0)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x32
internal/poll.(*pollDesc).waitRead(...)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc00016c600)
	/usr/local/go/src/internal/poll/fd_unix.go:614 +0x234
net.(*netFD).accept(0xc00016c600)
	/usr/local/go/src/net/fd_unix.go:172 +0x35
net.(*TCPListener).accept(0xc000011608)
	/usr/local/go/src/net/tcpsock_posix.go:142 +0x28
net.(*TCPListener).Accept(0xc000011608)
	/usr/local/go/src/net/tcpsock.go:288 +0x3d
net/http.(*Server).Serve(0xc0000fa1e0, {0xb6e0f0, 0xc000011608})
	/usr/local/go/src/net/http/server.go:3070 +0x385
github.com/superfly/litefs/http.(*Server).Serve.func1()
	/src/litefs/http/server.go:74 +0x2d
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/go/pkg/mod/golang.org/x/sync@v0.0.0-20220601150217-0de741cfad7f/errgroup/errgroup.go:75 +0x64
created by golang.org/x/sync/errgroup.(*Group).Go
	/go/pkg/mod/golang.org/x/sync@v0.0.0-20220601150217-0de741cfad7f/errgroup/errgroup.go:72 +0xa5

goroutine 49 [syscall, 21 minutes]:
syscall.Syscall6(0x0?, 0x0?, 0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/usr/local/go/src/syscall/syscall_linux.go:90 +0x36
os.(*Process).blockUntilWaitable(0xc000264180)
	/usr/local/go/src/os/wait_waitid.go:32 +0x87
os.(*Process).wait(0xc000264180)
	/usr/local/go/src/os/exec_unix.go:22 +0x28
os.(*Process).Wait(...)
	/usr/local/go/src/os/exec.go:132
os/exec.(*Cmd).Wait(0xc0001511e0)
	/usr/local/go/src/os/exec/exec.go:599 +0x4b
main.(*MountCommand).execCmd.func1()
	/src/litefs/cmd/litefs/mount_linux.go:332 +0x25
created by main.(*MountCommand).execCmd
	/src/litefs/cmd/litefs/mount_linux.go:332 +0x311

goroutine 69627 [runnable]:
net/http.(*connReader).startBackgroundRead.func2()
	/usr/local/go/src/net/http/server.go:674
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1594 +0x1
created by net/http.(*connReader).startBackgroundRead
	/usr/local/go/src/net/http/server.go:674 +0xca

It looks like it actually shut down twice. Here are the logs. Sadly, it does look like the mark-as-read logic finishes successfully so that’s probably just a coincidence. Also, one of the shut downs happened without those logs as well, so yeah, probably not the issue.

e[2m2022-11-28T13:59:35Ze[0m app[ecf73ece] e[32mdene[0m [e[34minfoe[0m]Background refresh for total-post-reads:i-migrated-from-a-postgres-cluster-to-distributed-sqlite-with-litefs successful. Getting a fresh value for this took 142ms. Caching for 60000ms + 86400000ms stale in LRU.
e[2m2022-11-28T13:59:35Ze[0m app[ecf73ece] e[32mdene[0m [e[34minfoe[0m]GET /__metronome/metronome-5.6.1.js - - - - ms
e[2m2022-11-28T13:59:35Ze[0m app[ecf73ece] e[32mdene[0m [e[34minfoe[0m]POST /__metronome - - - - ms
e[2m2022-11-28T13:59:35Ze[0m app[ecf73ece] e[32mdene[0m [e[34minfoe[0m]GET /__metronome/metronome-5.6.1.js - - - - ms
e[2m2022-11-28T13:59:36Ze[0m app[ecf73ece] e[32mdene[0m [e[34minfoe[0m]GET /__metronome/metronome-5.6.1.js - - - - ms
e[2m2022-11-28T13:59:37Ze[0m app[ecf73ece] e[32mdene[0m [e[34minfoe[0m]POST /__metronome - - - - ms
e[2m2022-11-28T13:59:42Ze[0m runner[ecf73ece] e[32mdene[0m [e[34minfoe[0m]Shutting down virtual machine
e[2m2022-11-28T13:59:42Ze[0m app[ecf73ece] e[32mdene[0m [e[34minfoe[0m]Sending signal SIGINT to main child process w/ PID 529
e[2m2022-11-28T13:59:42Ze[0m app[ecf73ece] e[32mdene[0m [e[34minfoe[0m]sending signal to exec process
e[2m2022-11-28T13:59:42Ze[0m app[ecf73ece] e[32mdene[0m [e[34minfoe[0m]waiting for exec process to close
e[2m2022-11-28T13:59:42Ze[0m app[ecf73ece] e[32mdene[0m [e[34minfoe[0m]signal received, litefs shutting down
e[2m2022-11-28T13:59:42Ze[0m app[ecf73ece] e[32mdene[0m [e[34minfoe[0m]exiting primary, destroying lease
e[2m2022-11-28T13:59:42Ze[0m app[ecf73ece] e[32mdene[0m [e[34minfoe[0m]exit status 1: fusermount: failed to unmount /litefs/data-2: Device or resource busy
e[2m2022-11-28T13:59:42Ze[0m app[ecf73ece] e[32mdene[0m [e[34minfoe[0m]Error watching sqlite.db-pos [Error: ENOTCONN: socket is not connected, stat '/litefs/data-2'] {
e[2m2022-11-28T13:59:42Ze[0m app[ecf73ece] e[32mdene[0m [e[34minfoe[0m]  errno: -107,
e[2m2022-11-28T13:59:42Ze[0m app[ecf73ece] e[32mdene[0m [e[34minfoe[0m]  code: 'ENOTCONN',
e[2m2022-11-28T13:59:42Ze[0m app[ecf73ece] e[32mdene[0m [e[34minfoe[0m]  syscall: 'stat',
e[2m2022-11-28T13:59:42Ze[0m app[ecf73ece] e[32mdene[0m [e[34minfoe[0m]  path: '/litefs/data-2'
e[2m2022-11-28T13:59:42Ze[0m app[ecf73ece] e[32mdene[0m [e[34minfoe[0m]}
e[2m2022-11-28T13:59:42Ze[0m app[ecf73ece] e[32mdene[0m [e[34minfoe[0m]Starting clean up.
e[2m2022-11-28T13:59:42Ze[0m app[ecf73ece] e[32mdene[0m [e[34minfoe[0m]Umounting /dev/vdc from /data
e[2m2022-11-28T13:59:42Ze[0m app[ecf73ece] e[32mdene[0m [e[34minfoe[0m]error umounting /data: EBUSY: Device or resource busy, retrying in a bit
e[2m2022-11-28T13:59:44Ze[0m runner[d4dcc8d9] e[32mdene[0m [e[34minfoe[0m]Starting instance
e[2m2022-11-28T13:59:44Ze[0m runner[d4dcc8d9] e[32mdene[0m [e[34minfoe[0m]Configuring virtual machine
e[2m2022-11-28T13:59:44Ze[0m runner[d4dcc8d9] e[32mdene[0m [e[34minfoe[0m]Pulling container image
e[2m2022-11-28T13:59:46Ze[0m runner[d4dcc8d9] e[32mdene[0m [e[34minfoe[0m]Unpacking image
e[2m2022-11-28T13:59:48Ze[0m runner[d4dcc8d9] e[32mdene[0m [e[34minfoe[0m]Preparing kernel init
[2m2022-11-28T14:09:18Ze[0m app[d4dcc8d9] e[32mdene[0m [e[34minfoe[0m]GET /blog/rss.xml - - - - ms
e[2m2022-11-28T14:09:18Ze[0m app[d4dcc8d9] e[32mdene[0m [e[34minfoe[0m]POST /blog/avoid-nesting-when-youre-testing?_data=routes%2Fblog.%24slug - - - - ms
e[2m2022-11-28T14:09:18Ze[0m app[d4dcc8d9] e[32mdene[0m [e[34minfoe[0m]mark-as-read: getting current read count for avoid-nesting-when-youre-testing
e[2m2022-11-28T14:09:18Ze[0m app[d4dcc8d9] e[32mdene[0m [e[34minfoe[0m]mark-as-read: adding read for avoid-nesting-when-youre-testing by 439f69c5-7424-4f7b-9a32-6d78f83d7c25
e[2m2022-11-28T14:09:18Ze[0m app[d4dcc8d9] e[32mdene[0m [e[34minfoe[0m]Updated the cache value for blog:making-your-ui-tests-resilient-to-change:rankings. Getting a fresh value for this took 521ms. Caching for 604800000ms + 86400000ms stale in SQLite cache.
e[2m2022-11-28T14:09:18Ze[0m app[d4dcc8d9] e[32mdene[0m [e[34minfoe[0m]Updated the cache value for blog:rankings. Getting a fresh value for this took 571ms. Caching for 3600000ms + 86400000ms stale in SQLite cache.
e[2m2022-11-28T14:09:18Ze[0m app[d4dcc8d9] e[32mdene[0m [e[34minfoe[0m]mark-as-read: ranking update finished
e[2m2022-11-28T14:09:18Ze[0m app[d4dcc8d9] e[32mdene[0m [e[34minfoe[0m]mark-as-read: successfully finished mark-as-read for making-your-ui-tests-resilient-to-change. Responding.
e[2m2022-11-28T14:13:07Ze[0m runner[d4dcc8d9] e[32mdene[0m [e[34minfoe[0m]Shutting down virtual machine
e[2m2022-11-28T14:13:07Ze[0m app[d4dcc8d9] e[32mdene[0m [e[34minfoe[0m]Sending signal SIGINT to main child process w/ PID 529
e[2m2022-11-28T14:13:07Ze[0m app[d4dcc8d9] e[32mdene[0m [e[34minfoe[0m]sending signal to exec process
e[2m2022-11-28T14:13:07Ze[0m app[d4dcc8d9] e[32mdene[0m [e[34minfoe[0m]waiting for exec process to close
e[2m2022-11-28T14:13:07Ze[0m app[d4dcc8d9] e[32mdene[0m [e[34minfoe[0m]signal received, litefs shutting down
e[2m2022-11-28T14:13:07Ze[0m app[d4dcc8d9] e[32mdene[0m [e[34minfoe[0m]exiting primary, destroying lease
e[2m2022-11-28T14:13:07Ze[0m app[d4dcc8d9] e[32mdene[0m [e[34minfoe[0m]exit status 1: fusermount: failed to unmount /litefs/data-2: Device or resource busy
e[2m2022-11-28T14:13:16Ze[0m runner[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]Starting instance
e[2m2022-11-28T14:13:16Ze[0m runner[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]Configuring virtual machine
e[2m2022-11-28T14:13:16Ze[0m runner[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]Pulling container image
e[2m2022-11-28T14:13:18Ze[0m runner[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]Unpacking image
e[2m2022-11-28T14:13:20Ze[0m runner[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]Preparing kernel init
e[2m2022-11-28T14:13:20Ze[0m runner[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]Setting up volume 'data'
e[2m2022-11-28T14:13:20Ze[0m runner[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]Opening encrypted volume
e[2m2022-11-28T14:13:20Ze[0m runner[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]Configuring firecracker
e[2m2022-11-28T14:13:21Ze[0m runner[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]Starting virtual machine
e[2m2022-11-28T14:13:21Ze[0m app[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]Starting init (commit: 81d5330)...
e[2m2022-11-28T14:13:21Ze[0m app[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]Mounting /dev/vdc at /data w/ uid: 0, gid: 0 and chmod 0755
e[2m2022-11-28T14:13:21Ze[0m app[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]Preparing to run: `docker-entrypoint.sh litefs mount -- node ./other/start.js` as root
e[2m2022-11-28T14:13:21Ze[0m app[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]2022/11/28 14:13:21 listening on [fdaa:0:23df:a7b:d827:2:6f13:2]:22 (DNS: [fdaa::3]:53)
e[2m2022-11-28T14:13:21Ze[0m app[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]config file read from /etc/litefs.yml
e[2m2022-11-28T14:13:21Ze[0m app[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]LiteFS v0.3.0-beta5, commit=9bb4a2d406f69bdb513c4fd91a5986d62dc08612
e[2m2022-11-28T14:13:21Ze[0m app[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]Using Consul to determine primary
e[2m2022-11-28T14:13:23Ze[0m app[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]initializing consul: key= url=https://:b832a863-60c2-48d4-8289-bdc5d80fc444@consul-iad.fly-shared.net/kcd-g3zmqx5x3y49dlp4/ hostname=ca1e06d2 advertise-url=http://ca1e06d2.vm.kcd.internal:20202
e[2m2022-11-28T14:13:23Ze[0m app[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]LiteFS mounted to: /litefs/data-2
e[2m2022-11-28T14:13:23Ze[0m app[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]http server listening on: http://localhost:20202
e[2m2022-11-28T14:13:23Ze[0m app[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]waiting to connect to cluster
e[2m2022-11-28T14:13:24Ze[0m app[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]primary lease acquired, advertising as http://ca1e06d2.vm.kcd.internal:20202
e[2m2022-11-28T14:13:24Ze[0m app[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]connected to cluster, ready
e[2m2022-11-28T14:13:24Ze[0m app[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]starting subprocess: node [./other/start.js]
e[2m2022-11-28T14:13:24Ze[0m app[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]waiting for signal or subprocess to exit
e[2m2022-11-28T14:13:24Ze[0m app[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]No .primary file found.
e[2m2022-11-28T14:13:24Ze[0m app[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]Using current instance (ca1e06d2) as primary (in den)
e[2m2022-11-28T14:13:24Ze[0m app[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]Instance (ca1e06d2) in den is primary. Deploying migrations.
e[2m2022-11-28T14:13:26Ze[0m app[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]Prisma schema loaded from prisma/schema.prisma
e[2m2022-11-28T14:13:26Ze[0m app[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]Datasource "db": SQLite database "sqlite.db" at "file:/litefs/data-2/sqlite.db"
e[2m2022-11-28T14:13:26Ze[0m app[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]2 migrations found in prisma/migrations
e[2m2022-11-28T14:13:26Ze[0m app[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]No pending migrations to apply.
e[2m2022-11-28T14:13:26Ze[0m app[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]npm notice
e[2m2022-11-28T14:13:26Ze[0m app[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]npm notice New major version of npm available! 8.19.2 -> 9.1.2
e[2m2022-11-28T14:13:26Ze[0m app[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]npm notice Changelog: <https://github.com/npm/cli/releases/tag/v9.1.2>
e[2m2022-11-28T14:13:26Ze[0m app[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]npm notice Run `npm install -g npm@9.1.2` to update!
e[2m2022-11-28T14:13:26Ze[0m app[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]npm notice
e[2m2022-11-28T14:13:26Ze[0m app[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]Starting app...
e[2m2022-11-28T14:13:26Ze[0m app[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]> kentcdodds.com@1.0.0 start
e[2m2022-11-28T14:13:26Ze[0m app[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]> cross-env NODE_ENV=production node --require ./node_modules/dotenv/config ./index.js
e[2m2022-11-28T14:13:29Ze[0m app[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]Express server listening on port 8080
e[2m2022-11-28T14:13:29Ze[0m app[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]prisma:info Starting a sqlite pool with 3 connections.
e[2m2022-11-28T14:13:32Ze[0m app[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]GET /blog/react-hooks-whats-going-to-happen-to-render-props?_data=root - - - - ms
e[2m2022-11-28T14:13:32Ze[0m app[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]GET /blog/rss.xml - - - - ms
e[2m2022-11-28T14:13:33Ze[0m app[ca1e06d2] e[32mdene[0m [e[34minfoe[0m]GET /blog/stop-mocking-fetch - - - - ms

@kentcdodds have you seen/tried this?

I have not, but the deploy process looks pretty straightforward, and I dig NATS. You can query the logs for specific times then.

1 Like

@kentcdodds Have you tried running without LiteFS? That may at least help us isolate the issue. You can make a copy of your current database by running this:

$ sqlite3 /path/to/db ".dump" > dump.sql

and then you can run that on a new instance without LiteFS:

$ sqlite3 /path/to/db < dump.sql

You can run the same commands when switching back to LiteFS as well.

It’s down right now and I don’t know why I didn’t think of this before, but when I try to get the stack trace… it’s not up to respond :man_facepalming:

It’s now come up and here’s the stack trace (probably not helpful, but here it is anyway):

goroutine 31008 [running]:
runtime/pprof.writeGoroutineStacks({0xb6a360, 0xc00027e000})
	/usr/local/go/src/runtime/pprof/pprof.go:692 +0x70
runtime/pprof.writeGoroutine({0xb6a360?, 0xc00027e000?}, 0xc0001db1e0?)
	/usr/local/go/src/runtime/pprof/pprof.go:681 +0x2b
runtime/pprof.(*Profile).WriteTo(0xa2dda0?, {0xb6a360?, 0xc00027e000?}, 0xc?)
	/usr/local/go/src/runtime/pprof/pprof.go:330 +0x14b
net/http/pprof.handler.ServeHTTP({0xc000262221, 0x9}, {0xb6e300, 0xc00027e000}, 0xc00021e1b1?)
	/usr/local/go/src/net/http/pprof/pprof.go:253 +0x4a5
net/http/pprof.Index({0xb6e300?, 0xc00027e000}, 0xc000130800)
	/usr/local/go/src/net/http/pprof/pprof.go:371 +0x13e
github.com/superfly/litefs/http.(*Server).serveHTTP(0xc0001652c0?, {0xb6e300, 0xc00027e000}, 0xc000130800)
	/src/litefs/http/server.go:128 +0x186
net/http.HandlerFunc.ServeHTTP(0x40?, {0xb6e300?, 0xc00027e000?}, 0x0?)
	/usr/local/go/src/net/http/server.go:2109 +0x2f
golang.org/x/net/http2/h2c.h2cHandler.ServeHTTP({{0xb6aec0?, 0xc000150a80?}, 0xc000089bc0?}, {0xb6e300, 0xc00027e000}, 0xc000130800)
	/go/pkg/mod/golang.org/x/net@v0.0.0-20220909164309-bea034e7d591/http2/h2c/h2c.go:113 +0x49f
net/http.serverHandler.ServeHTTP({0xc000342ba0?}, {0xb6e300, 0xc00027e000}, 0xc000130800)
	/usr/local/go/src/net/http/server.go:2947 +0x30c
net/http.(*conn).serve(0xc0001e59a0, {0xb6ea98, 0xc000342030})
	/usr/local/go/src/net/http/server.go:1991 +0x607
created by net/http.(*Server).Serve
	/usr/local/go/src/net/http/server.go:3102 +0x4db

goroutine 1 [select, 1 minutes]:
main.runMount({0xb6ea28, 0xc00002c080}, {0xc0000220c0, 0x3, 0x3})
	/src/litefs/cmd/litefs/main.go:103 +0x558
main.run({0xb6ea28?, 0xc00002c080?}, {0xc0000220b0?, 0xc000119f70?, 0x406719?})
	/src/litefs/cmd/litefs/main.go:47 +0x78
main.main()
	/src/litefs/cmd/litefs/main.go:30 +0x71

goroutine 23 [syscall, 1 minutes]:
os/signal.signal_recv()
	/usr/local/go/src/runtime/sigqueue.go:152 +0x2f
os/signal.loop()
	/usr/local/go/src/os/signal/signal_unix.go:23 +0x19
created by os/signal.Notify.func1.1
	/usr/local/go/src/os/signal/signal.go:151 +0x2a

goroutine 38 [select]:
github.com/superfly/litefs.(*Store).monitorLeaseAsPrimary(0xc000130500, {0xb6e9f0, 0xc000089b00}, {0xb6ed00, 0xc0003424b0})
	/src/litefs/store.go:526 +0x1f0
github.com/superfly/litefs.(*Store).monitorLease(0xc000130500, {0xb6e9f0, 0xc000089b00})
	/src/litefs/store.go:451 +0x1fd
github.com/superfly/litefs.(*Store).Open.func1()
	/src/litefs/store.go:134 +0x25
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/go/pkg/mod/golang.org/x/sync@v0.0.0-20220601150217-0de741cfad7f/errgroup/errgroup.go:75 +0x64
created by golang.org/x/sync/errgroup.(*Group).Go
	/go/pkg/mod/golang.org/x/sync@v0.0.0-20220601150217-0de741cfad7f/errgroup/errgroup.go:72 +0xa5

goroutine 39 [select, 1 minutes]:
github.com/superfly/litefs.(*Store).monitorRetention(0xc000130500, {0xb6e9f0, 0xc000089b00})
	/src/litefs/store.go:613 +0xe7
github.com/superfly/litefs.(*Store).Open.func2()
	/src/litefs/store.go:138 +0x25
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/go/pkg/mod/golang.org/x/sync@v0.0.0-20220601150217-0de741cfad7f/errgroup/errgroup.go:75 +0x64
created by golang.org/x/sync/errgroup.(*Group).Go
	/go/pkg/mod/golang.org/x/sync@v0.0.0-20220601150217-0de741cfad7f/errgroup/errgroup.go:72 +0xa5

goroutine 36 [IO wait]:
internal/poll.runtime_pollWait(0x7fd5be2ae000, 0x72)
	/usr/local/go/src/runtime/netpoll.go:305 +0x89
internal/poll.(*pollDesc).wait(0xc00016ac00?, 0xc0000bb300?, 0x0)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x32
internal/poll.(*pollDesc).waitRead(...)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc00016ac00, {0xc0000bb300, 0x1300, 0x1300})
	/usr/local/go/src/internal/poll/fd_unix.go:167 +0x25a
net.(*netFD).Read(0xc00016ac00, {0xc0000bb300?, 0x46bb2e?, 0x40dfa5?})
	/usr/local/go/src/net/fd_posix.go:55 +0x29
net.(*conn).Read(0xc0000127b8, {0xc0000bb300?, 0xc0001c37e0?, 0x4339f6?})
	/usr/local/go/src/net/net.go:183 +0x45
crypto/tls.(*atLeastReader).Read(0xc0002dbe48, {0xc0000bb300?, 0x0?, 0xc0001c3860?})
	/usr/local/go/src/crypto/tls/conn.go:787 +0x3d
bytes.(*Buffer).ReadFrom(0xc00014de78, {0xb696c0, 0xc0002dbe48})
	/usr/local/go/src/bytes/buffer.go:202 +0x98
crypto/tls.(*Conn).readFromUntil(0xc00014dc00, {0xb6a0a0?, 0xc0000127b8}, 0xf4c540?)
	/usr/local/go/src/crypto/tls/conn.go:809 +0xe5
crypto/tls.(*Conn).readRecordOrCCS(0xc00014dc00, 0x0)
	/usr/local/go/src/crypto/tls/conn.go:616 +0x116
crypto/tls.(*Conn).readRecord(...)
	/usr/local/go/src/crypto/tls/conn.go:582
crypto/tls.(*Conn).Read(0xc00014dc00, {0xc00019c000, 0x1000, 0x771860?})
	/usr/local/go/src/crypto/tls/conn.go:1287 +0x16f
bufio.(*Reader).Read(0xc0000d86c0, {0xc00027f700, 0x9, 0x78e765?})
	/usr/local/go/src/bufio/bufio.go:237 +0x1bb
io.ReadAtLeast({0xb69540, 0xc0000d86c0}, {0xc00027f700, 0x9, 0x9}, 0x9)
	/usr/local/go/src/io/io.go:332 +0x9a
io.ReadFull(...)
	/usr/local/go/src/io/io.go:351
net/http.http2readFrameHeader({0xc00027f700?, 0x9?, 0xc000342900?}, {0xb69540?, 0xc0000d86c0?})
	/usr/local/go/src/net/http/h2_bundle.go:1565 +0x6e
net/http.(*http2Framer).ReadFrame(0xc00027f6c0)
	/usr/local/go/src/net/http/h2_bundle.go:1829 +0x95
net/http.(*http2clientConnReadLoop).run(0xc0001c3f98)
	/usr/local/go/src/net/http/h2_bundle.go:8875 +0x130
net/http.(*http2ClientConn).readLoop(0xc0000aa180)
	/usr/local/go/src/net/http/h2_bundle.go:8771 +0x6f
created by net/http.(*http2Transport).newClientConn
	/usr/local/go/src/net/http/h2_bundle.go:7478 +0xaaa

goroutine 48 [select, 1 minutes]:
os/exec.(*Cmd).watchCtx.func1()
	/usr/local/go/src/os/exec/exec.go:648 +0x96
created by os/exec.(*Cmd).watchCtx
	/usr/local/go/src/os/exec/exec.go:647 +0x8e

goroutine 44 [syscall]:
syscall.Syscall(0x4780ee?, 0xf08980?, 0x40dc07?, 0x9f8a60?)
	/usr/local/go/src/syscall/syscall_linux.go:68 +0x27
syscall.read(0xf08980?, {0xc000344000?, 0x8?, 0x0?})
	/usr/local/go/src/syscall/zsyscall_linux_amd64.go:696 +0x45
syscall.Read(...)
	/usr/local/go/src/syscall/syscall_unix.go:183
bazil.org/fuse.(*Conn).ReadRequest(0xc0000d8de0)
	/go/pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fuse.go:580 +0xc6
bazil.org/fuse/fs.(*Server).Serve(0xc00027f7a0, {0xb699a0?, 0xc0000d8ae0})
	/go/pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fs/serve.go:501 +0x3ef
github.com/superfly/litefs/fuse.(*FileSystem).Mount.func2()
	/src/litefs/fuse/file_system.go:93 +0x28
created by github.com/superfly/litefs/fuse.(*FileSystem).Mount
	/src/litefs/fuse/file_system.go:92 +0x31c

goroutine 45 [IO wait]:
internal/poll.runtime_pollWait(0x7fd5be2ae0f0, 0x72)
	/usr/local/go/src/runtime/netpoll.go:305 +0x89
internal/poll.(*pollDesc).wait(0xc00016a600?, 0x6?, 0x0)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x32
internal/poll.(*pollDesc).waitRead(...)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc00016a600)
	/usr/local/go/src/internal/poll/fd_unix.go:614 +0x234
net.(*netFD).accept(0xc00016a600)
	/usr/local/go/src/net/fd_unix.go:172 +0x35
net.(*TCPListener).accept(0xc0000115f0)
	/usr/local/go/src/net/tcpsock_posix.go:142 +0x28
net.(*TCPListener).Accept(0xc0000115f0)
	/usr/local/go/src/net/tcpsock.go:288 +0x3d
net/http.(*Server).Serve(0xc0000fa1e0, {0xb6e0f0, 0xc0000115f0})
	/usr/local/go/src/net/http/server.go:3070 +0x385
github.com/superfly/litefs/http.(*Server).Serve.func1()
	/src/litefs/http/server.go:74 +0x2d
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/go/pkg/mod/golang.org/x/sync@v0.0.0-20220601150217-0de741cfad7f/errgroup/errgroup.go:75 +0x64
created by golang.org/x/sync/errgroup.(*Group).Go
	/go/pkg/mod/golang.org/x/sync@v0.0.0-20220601150217-0de741cfad7f/errgroup/errgroup.go:72 +0xa5

goroutine 49 [syscall, 1 minutes]:
syscall.Syscall6(0x0?, 0x0?, 0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/usr/local/go/src/syscall/syscall_linux.go:90 +0x36
os.(*Process).blockUntilWaitable(0xc000262180)
	/usr/local/go/src/os/wait_waitid.go:32 +0x87
os.(*Process).wait(0xc000262180)
	/usr/local/go/src/os/exec_unix.go:22 +0x28
os.(*Process).Wait(...)
	/usr/local/go/src/os/exec.go:132
os/exec.(*Cmd).Wait(0xc0001914a0)
	/usr/local/go/src/os/exec/exec.go:599 +0x4b
main.(*MountCommand).execCmd.func1()
	/src/litefs/cmd/litefs/mount_linux.go:332 +0x25
created by main.(*MountCommand).execCmd
	/src/litefs/cmd/litefs/mount_linux.go:332 +0x311

goroutine 31009 [runnable]:
net/http.(*connReader).startBackgroundRead.func2()
	/usr/local/go/src/net/http/server.go:674
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1594 +0x1
created by net/http.(*connReader).startBackgroundRead
	/usr/local/go/src/net/http/server.go:674 +0xca

I’d really rather figure out what’s wrong here than do another data dump process. :frowning: But I suppose if nobody has any other ideas maybe remove LiteFS from the mix is the next thing to try.

EDIT: Here’s the logs when it fell over again:

[2m2022-11-29T13:42:12Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]GET /blog - - - - ms
e[2m2022-11-29T13:42:13Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]POST /blog/fix-the-not-wrapped-in-act-warning?_data=routes%2Fblog.%24slug - - - - ms
e[2m2022-11-29T13:42:13Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]mark-as-read: getting current read count for fix-the-not-wrapped-in-act-warning
e[2m2022-11-29T13:42:13Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]mark-as-read: adding read for fix-the-not-wrapped-in-act-warning by 85ec8cd2-a64b-4f3d-873f-c4aea0a2e190
e[2m2022-11-29T13:42:13Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]Updated the cache value for blog:aha-programming:rankings. Getting a fresh value for this took 515ms. Caching for 604800000ms + 86400000ms stale in SQLite cache.
e[2m2022-11-29T13:42:13Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]Updated the cache value for blog:rankings. Getting a fresh value for this took 566ms. Caching for 3600000ms + 86400000ms stale in SQLite cache.
e[2m2022-11-29T13:42:13Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]mark-as-read: ranking update finished
e[2m2022-11-29T13:42:13Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]mark-as-read: successfully finished mark-as-read for aha-programming. Responding.
e[2m2022-11-29T13:49:14Ze[0m runner[e9580cbd] e[32mdene[0m [e[34minfoe[0m]Shutting down virtual machine
e[2m2022-11-29T13:49:14Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]Sending signal SIGINT to main child process w/ PID 529
e[2m2022-11-29T13:49:14Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]sending signal to exec process
e[2m2022-11-29T13:49:14Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]waiting for exec process to close
e[2m2022-11-29T13:49:14Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]signal received, litefs shutting down
e[2m2022-11-29T13:49:14Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]exiting primary, destroying lease
e[2m2022-11-29T13:49:14Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]exit status 1: fusermount: failed to unmount /litefs/data-2: Device or resource busy
e[2m2022-11-29T13:49:27Ze[0m runner[e9580cbd] e[32mdene[0m [e[34minfoe[0m]Starting instance
e[2m2022-11-29T13:49:27Ze[0m runner[e9580cbd] e[32mdene[0m [e[34minfoe[0m]Configuring virtual machine
e[2m2022-11-29T13:49:27Ze[0m runner[e9580cbd] e[32mdene[0m [e[34minfoe[0m]Pulling container image
e[2m2022-11-29T13:49:28Ze[0m runner[e9580cbd] e[32mdene[0m [e[34minfoe[0m]Unpacking image
e[2m2022-11-29T13:49:28Ze[0m runner[e9580cbd] e[32mdene[0m [e[34minfoe[0m]Preparing kernel init
e[2m2022-11-29T13:49:28Ze[0m runner[e9580cbd] e[32mdene[0m [e[34minfoe[0m]Setting up volume 'data'
e[2m2022-11-29T13:49:28Ze[0m runner[e9580cbd] e[32mdene[0m [e[34minfoe[0m]Opening encrypted volume
e[2m2022-11-29T13:49:28Ze[0m runner[e9580cbd] e[32mdene[0m [e[34minfoe[0m]Configuring firecracker
e[2m2022-11-29T13:49:28Ze[0m runner[e9580cbd] e[32mdene[0m [e[34minfoe[0m]Starting virtual machine
e[2m2022-11-29T13:49:28Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]Starting init (commit: 81d5330)...
e[2m2022-11-29T13:49:28Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]Mounting /dev/vdc at /data w/ uid: 0, gid: 0 and chmod 0755
e[2m2022-11-29T13:49:28Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]Preparing to run: `docker-entrypoint.sh litefs mount -- node ./other/start.js` as root
e[2m2022-11-29T13:49:28Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]2022/11/29 13:49:28 listening on [fdaa:0:23df:a7b:d827:2:6f13:2]:22 (DNS: [fdaa::3]:53)
e[2m2022-11-29T13:49:28Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]config file read from /etc/litefs.yml
e[2m2022-11-29T13:49:28Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]LiteFS v0.3.0-beta5, commit=9bb4a2d406f69bdb513c4fd91a5986d62dc08612
e[2m2022-11-29T13:49:28Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]Using Consul to determine primary
e[2m2022-11-29T13:49:31Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]initializing consul: key= url=https://:b832a863-60c2-48d4-8289-bdc5d80fc444@consul-iad.fly-shared.net/kcd-g3zmqx5x3y49dlp4/ hostname=e9580cbd advertise-url=http://e9580cbd.vm.kcd.internal:20202
e[2m2022-11-29T13:49:31Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]LiteFS mounted to: /litefs/data-2
e[2m2022-11-29T13:49:31Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]http server listening on: http://localhost:20202
e[2m2022-11-29T13:49:31Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]waiting to connect to cluster
e[2m2022-11-29T13:49:31Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]primary lease acquired, advertising as http://e9580cbd.vm.kcd.internal:20202
e[2m2022-11-29T13:49:31Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]connected to cluster, ready
e[2m2022-11-29T13:49:31Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]starting subprocess: node [./other/start.js]
e[2m2022-11-29T13:49:31Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]waiting for signal or subprocess to exit
e[2m2022-11-29T13:49:32Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]No .primary file found.
e[2m2022-11-29T13:49:32Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]Using current instance (e9580cbd) as primary (in den)
e[2m2022-11-29T13:49:32Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]Instance (e9580cbd) in den is primary. Deploying migrations.
e[2m2022-11-29T13:49:33Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]Prisma schema loaded from prisma/schema.prisma
e[2m2022-11-29T13:49:33Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]Datasource "db": SQLite database "sqlite.db" at "file:/litefs/data-2/sqlite.db"
e[2m2022-11-29T13:49:33Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]2 migrations found in prisma/migrations
e[2m2022-11-29T13:49:33Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]No pending migrations to apply.
e[2m2022-11-29T13:49:33Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]npm notice
e[2m2022-11-29T13:49:33Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]npm notice New major version of npm available! 8.19.2 -> 9.1.2
e[2m2022-11-29T13:49:33Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]npm notice Changelog: <https://github.com/npm/cli/releases/tag/v9.1.2>
e[2m2022-11-29T13:49:33Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]npm notice Run `npm install -g npm@9.1.2` to update!
e[2m2022-11-29T13:49:33Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]npm notice
e[2m2022-11-29T13:49:33Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]Starting app...
e[2m2022-11-29T13:49:34Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]> kentcdodds.com@1.0.0 start
e[2m2022-11-29T13:49:34Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]> cross-env NODE_ENV=production node --require ./node_modules/dotenv/config ./index.js
e[2m2022-11-29T13:49:36Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]Express server listening on port 8080
e[2m2022-11-29T13:49:37Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]prisma:info Starting a sqlite pool with 3 connections.
e[2m2022-11-29T13:49:40Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]GET /blog/the-testing-trophy-and-testing-classifications - - - - ms
e[2m2022-11-29T13:49:40Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]GET /uses - - - - ms
e[2m2022-11-29T13:49:40Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]GET /blog/aha-testing - - - - ms
e[2m2022-11-29T13:49:41Ze[0m app[e9580cbd] e[32mdene[0m [e[34minfoe[0m]POST /blog/avoid-the-test-user?_data=routes%2Fblog.%24slug - - - - ms

It might be the easier option—at least it will narrow down what the issue is. If it’s LiteFS related, we could try continuous profiling via profefe (although I haven’t personally used it before). Or, since it happens fairly regularly, you could sample the heap profile from your home computer/laptop every 30 seconds:

while true
do
curl -s http://$HOSTNAME:20202/debug/pprof/heap > heap.`date -Iseconds`
sleep 30
end

Are you still seeing memory spikes?

I was just about to look at this and then I noticed a CPU spike again.

I took a heap profile. I don’t know what to do with it but I’ve got it.

I’m not seeing memory spikes (though the app does sit at ~1GB of memory which is a surprise as my old site sat around 700MB).

I’m going to go ahead and try removing LiteFS for now. I don’t know whether to be hopeful that fixes it or not :sweat_smile:

Oh, I also grabbed a couple stack traces while the CPU spike happened:

goroutine 361738 [running]:
runtime/pprof.writeGoroutineStacks({0xb6a360, 0xc00027e000})
	/usr/local/go/src/runtime/pprof/pprof.go:692 +0x70
runtime/pprof.writeGoroutine({0xb6a360?, 0xc00027e000?}, 0x0?)
	/usr/local/go/src/runtime/pprof/pprof.go:681 +0x2b
runtime/pprof.(*Profile).WriteTo(0xa2dda0?, {0xb6a360?, 0xc00027e000?}, 0xc?)
	/usr/local/go/src/runtime/pprof/pprof.go:330 +0x14b
net/http/pprof.handler.ServeHTTP({0xc0002621f1, 0x9}, {0xb6e300, 0xc00027e000}, 0xc0002fc197?)
	/usr/local/go/src/net/http/pprof/pprof.go:253 +0x4a5
net/http/pprof.Index({0xb6e300?, 0xc00027e000}, 0xc000130200)
	/usr/local/go/src/net/http/pprof/pprof.go:371 +0x13e
github.com/superfly/litefs/http.(*Server).serveHTTP(0xc0001652c0?, {0xb6e300, 0xc00027e000}, 0xc000130200)
	/src/litefs/http/server.go:128 +0x186
net/http.HandlerFunc.ServeHTTP(0x40?, {0xb6e300?, 0xc00027e000?}, 0x0?)
	/usr/local/go/src/net/http/server.go:2109 +0x2f
golang.org/x/net/http2/h2c.h2cHandler.ServeHTTP({{0xb6aec0?, 0xc000150a80?}, 0xc000089bc0?}, {0xb6e300, 0xc00027e000}, 0xc000130200)
	/go/pkg/mod/golang.org/x/net@v0.0.0-20220909164309-bea034e7d591/http2/h2c/h2c.go:113 +0x49f
net/http.serverHandler.ServeHTTP({0xc0001a7110?}, {0xb6e300, 0xc00027e000}, 0xc000130200)
	/usr/local/go/src/net/http/server.go:2947 +0x30c
net/http.(*conn).serve(0xc0001643c0, {0xb6ea98, 0xc000342030})
	/usr/local/go/src/net/http/server.go:1991 +0x607
created by net/http.(*Server).Serve
	/usr/local/go/src/net/http/server.go:3102 +0x4db

goroutine 1 [select, 181 minutes]:
main.runMount({0xb6ea28, 0xc00002c080}, {0xc0000220c0, 0x3, 0x3})
	/src/litefs/cmd/litefs/main.go:103 +0x558
main.run({0xb6ea28?, 0xc00002c080?}, {0xc0000220b0?, 0xc000119f70?, 0x406719?})
	/src/litefs/cmd/litefs/main.go:47 +0x78
main.main()
	/src/litefs/cmd/litefs/main.go:30 +0x71

goroutine 23 [syscall, 181 minutes]:
os/signal.signal_recv()
	/usr/local/go/src/runtime/sigqueue.go:152 +0x2f
os/signal.loop()
	/usr/local/go/src/os/signal/signal_unix.go:23 +0x19
created by os/signal.Notify.func1.1
	/usr/local/go/src/os/signal/signal.go:151 +0x2a

goroutine 38 [select]:
github.com/superfly/litefs.(*Store).monitorLeaseAsPrimary(0xc000130500, {0xb6e9f0, 0xc000089b00}, {0xb6ed00, 0xc0003424b0})
	/src/litefs/store.go:526 +0x1f0
github.com/superfly/litefs.(*Store).monitorLease(0xc000130500, {0xb6e9f0, 0xc000089b00})
	/src/litefs/store.go:451 +0x1fd
github.com/superfly/litefs.(*Store).Open.func1()
	/src/litefs/store.go:134 +0x25
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/go/pkg/mod/golang.org/x/sync@v0.0.0-20220601150217-0de741cfad7f/errgroup/errgroup.go:75 +0x64
created by golang.org/x/sync/errgroup.(*Group).Go
	/go/pkg/mod/golang.org/x/sync@v0.0.0-20220601150217-0de741cfad7f/errgroup/errgroup.go:72 +0xa5

goroutine 39 [select]:
github.com/superfly/litefs.(*Store).monitorRetention(0xc000130500, {0xb6e9f0, 0xc000089b00})
	/src/litefs/store.go:613 +0xe7
github.com/superfly/litefs.(*Store).Open.func2()
	/src/litefs/store.go:138 +0x25
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/go/pkg/mod/golang.org/x/sync@v0.0.0-20220601150217-0de741cfad7f/errgroup/errgroup.go:75 +0x64
created by golang.org/x/sync/errgroup.(*Group).Go
	/go/pkg/mod/golang.org/x/sync@v0.0.0-20220601150217-0de741cfad7f/errgroup/errgroup.go:72 +0xa5

goroutine 36 [IO wait]:
internal/poll.runtime_pollWait(0x7fd5be2ae000, 0x72)
	/usr/local/go/src/runtime/netpoll.go:305 +0x89
internal/poll.(*pollDesc).wait(0xc00016ac00?, 0xc0000bb300?, 0x0)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x32
internal/poll.(*pollDesc).waitRead(...)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc00016ac00, {0xc0000bb300, 0x1300, 0x1300})
	/usr/local/go/src/internal/poll/fd_unix.go:167 +0x25a
net.(*netFD).Read(0xc00016ac00, {0xc0000bb300?, 0xc00006b560?, 0xc0000bb305?})
	/usr/local/go/src/net/fd_posix.go:55 +0x29
net.(*conn).Read(0xc0000127b8, {0xc0000bb300?, 0xc0001c3810?, 0x417ab3?})
	/usr/local/go/src/net/net.go:183 +0x45
crypto/tls.(*atLeastReader).Read(0xc000200318, {0xc0000bb300?, 0x0?, 0xc0001c3860?})
	/usr/local/go/src/crypto/tls/conn.go:787 +0x3d
bytes.(*Buffer).ReadFrom(0xc00014de78, {0xb696c0, 0xc000200318})
	/usr/local/go/src/bytes/buffer.go:202 +0x98
crypto/tls.(*Conn).readFromUntil(0xc00014dc00, {0xb6a0a0?, 0xc0000127b8}, 0x203000?)
	/usr/local/go/src/crypto/tls/conn.go:809 +0xe5
crypto/tls.(*Conn).readRecordOrCCS(0xc00014dc00, 0x0)
	/usr/local/go/src/crypto/tls/conn.go:616 +0x116
crypto/tls.(*Conn).readRecord(...)
	/usr/local/go/src/crypto/tls/conn.go:582
crypto/tls.(*Conn).Read(0xc00014dc00, {0xc00019c000, 0x1000, 0x771860?})
	/usr/local/go/src/crypto/tls/conn.go:1287 +0x16f
bufio.(*Reader).Read(0xc0000d86c0, {0xc00027f700, 0x9, 0x78e765?})
	/usr/local/go/src/bufio/bufio.go:237 +0x1bb
io.ReadAtLeast({0xb69540, 0xc0000d86c0}, {0xc00027f700, 0x9, 0x9}, 0x9)
	/usr/local/go/src/io/io.go:332 +0x9a
io.ReadFull(...)
	/usr/local/go/src/io/io.go:351
net/http.http2readFrameHeader({0xc00027f700?, 0x9?, 0xc0001a6de0?}, {0xb69540?, 0xc0000d86c0?})
	/usr/local/go/src/net/http/h2_bundle.go:1565 +0x6e
net/http.(*http2Framer).ReadFrame(0xc00027f6c0)
	/usr/local/go/src/net/http/h2_bundle.go:1829 +0x95
net/http.(*http2clientConnReadLoop).run(0xc0001c3f98)
	/usr/local/go/src/net/http/h2_bundle.go:8875 +0x130
net/http.(*http2ClientConn).readLoop(0xc0000aa180)
	/usr/local/go/src/net/http/h2_bundle.go:8771 +0x6f
created by net/http.(*http2Transport).newClientConn
	/usr/local/go/src/net/http/h2_bundle.go:7478 +0xaaa

goroutine 48 [select, 181 minutes]:
os/exec.(*Cmd).watchCtx.func1()
	/usr/local/go/src/os/exec/exec.go:648 +0x96
created by os/exec.(*Cmd).watchCtx
	/usr/local/go/src/os/exec/exec.go:647 +0x8e

goroutine 44 [syscall, 2 minutes]:
syscall.Syscall(0x4780ee?, 0xf08980?, 0x40dc07?, 0x9f8a60?)
	/usr/local/go/src/syscall/syscall_linux.go:68 +0x27
syscall.read(0xf08980?, {0xc000280000?, 0x11?, 0x0?})
	/usr/local/go/src/syscall/zsyscall_linux_amd64.go:696 +0x45
syscall.Read(...)
	/usr/local/go/src/syscall/syscall_unix.go:183
bazil.org/fuse.(*Conn).ReadRequest(0xc0000d8de0)
	/go/pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fuse.go:580 +0xc6
bazil.org/fuse/fs.(*Server).Serve(0xc00027f7a0, {0xb699a0?, 0xc0000d8ae0})
	/go/pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fs/serve.go:501 +0x3ef
github.com/superfly/litefs/fuse.(*FileSystem).Mount.func2()
	/src/litefs/fuse/file_system.go:93 +0x28
created by github.com/superfly/litefs/fuse.(*FileSystem).Mount
	/src/litefs/fuse/file_system.go:92 +0x31c

goroutine 45 [IO wait]:
internal/poll.runtime_pollWait(0x7fd5be2ae0f0, 0x72)
	/usr/local/go/src/runtime/netpoll.go:305 +0x89
internal/poll.(*pollDesc).wait(0xc00016a600?, 0x6?, 0x0)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x32
internal/poll.(*pollDesc).waitRead(...)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc00016a600)
	/usr/local/go/src/internal/poll/fd_unix.go:614 +0x234
net.(*netFD).accept(0xc00016a600)
	/usr/local/go/src/net/fd_unix.go:172 +0x35
net.(*TCPListener).accept(0xc0000115f0)
	/usr/local/go/src/net/tcpsock_posix.go:142 +0x28
net.(*TCPListener).Accept(0xc0000115f0)
	/usr/local/go/src/net/tcpsock.go:288 +0x3d
net/http.(*Server).Serve(0xc0000fa1e0, {0xb6e0f0, 0xc0000115f0})
	/usr/local/go/src/net/http/server.go:3070 +0x385
github.com/superfly/litefs/http.(*Server).Serve.func1()
	/src/litefs/http/server.go:74 +0x2d
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/go/pkg/mod/golang.org/x/sync@v0.0.0-20220601150217-0de741cfad7f/errgroup/errgroup.go:75 +0x64
created by golang.org/x/sync/errgroup.(*Group).Go
	/go/pkg/mod/golang.org/x/sync@v0.0.0-20220601150217-0de741cfad7f/errgroup/errgroup.go:72 +0xa5

goroutine 49 [syscall, 181 minutes]:
syscall.Syscall6(0x0?, 0x0?, 0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/usr/local/go/src/syscall/syscall_linux.go:90 +0x36
os.(*Process).blockUntilWaitable(0xc000262180)
	/usr/local/go/src/os/wait_waitid.go:32 +0x87
os.(*Process).wait(0xc000262180)
	/usr/local/go/src/os/exec_unix.go:22 +0x28
os.(*Process).Wait(...)
	/usr/local/go/src/os/exec.go:132
os/exec.(*Cmd).Wait(0xc0001914a0)
	/usr/local/go/src/os/exec/exec.go:599 +0x4b
main.(*MountCommand).execCmd.func1()
	/src/litefs/cmd/litefs/mount_linux.go:332 +0x25
created by main.(*MountCommand).execCmd
	/src/litefs/cmd/litefs/mount_linux.go:332 +0x311

goroutine 361712 [semacquire, 2 minutes]:
sync.runtime_SemacquireMutex(0xc00047bec0?, 0x80?, 0x78?)
	/usr/local/go/src/runtime/sema.go:77 +0x25
sync.(*Mutex).lockSlow(0xc0001bd040)
	/usr/local/go/src/sync/mutex.go:171 +0x165
sync.(*Mutex).Lock(...)
	/usr/local/go/src/sync/mutex.go:90
github.com/superfly/litefs.(*DB).Pos(0x18?)
	/src/litefs/db.go:143 +0x51
github.com/superfly/litefs/fuse.(*PosNode).Read(0xc00027f7a0?, {0xa39ce0?, 0xc000332600?}, 0xc000128310, 0xc00025ce58)
	/src/litefs/fuse/pos_node.go:50 +0x2e
bazil.org/fuse/fs.(*Server).handleRequest(0xc00027f7a0, {0xb6e9f0, 0xc000249000}, {0xb699e0?, 0xc000332600}, 0xc000089440, {0xb6d7c0?, 0xc000128310?}, 0xc00019af08)
	/go/pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fs/serve.go:1418 +0x1f5d
bazil.org/fuse/fs.(*Server).serve(0xc00027f7a0, {0xb6d7c0, 0xc000128310})
	/go/pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fs/serve.go:1015 +0x625
bazil.org/fuse/fs.(*Server).Serve.func1()
	/go/pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fs/serve.go:512 +0x69
created by bazil.org/fuse/fs.(*Server).Serve
	/go/pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fs/serve.go:510 +0x3df

goroutine 361709 [syscall, 2 minutes]:
syscall.Syscall(0x46bb2e?, 0x41b505?, 0xc0001cd348?, 0x0?)
	/usr/local/go/src/syscall/syscall_linux.go:68 +0x27
syscall.write(0x40d8bf?, {0xc000262240?, 0x0?, 0x8?})
	/usr/local/go/src/syscall/zsyscall_linux_amd64.go:924 +0x45
syscall.Write(...)
	/usr/local/go/src/syscall/syscall_unix.go:211
bazil.org/fuse.(*Conn).writeToKernel(0xc0001cd418?, {0xc000262240, 0x28, 0x28})
	/go/pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fuse.go:1165 +0xea
bazil.org/fuse.(*Conn).sendNotify(...)
	/go/pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fuse.go:1208
bazil.org/fuse.(*Conn).InvalidateNode(0x9cfd00?, 0x3, 0x0, 0xffffffffffffffff)
	/go/pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fuse.go:1233 +0x131
bazil.org/fuse/fs.(*Server).invalidateNode(0xc00027f7a0, {0xb699e0?, 0xc000332600?}, 0x0, 0xffffffffffffffff)
	/go/pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fs/serve.go:1729 +0x231
bazil.org/fuse/fs.(*Server).InvalidateNodeData(...)
	/go/pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fs/serve.go:1757
github.com/superfly/litefs/fuse.(*FileSystem).InvalidatePos(0xc0000d8ae0, 0xc000012320?)
	/src/litefs/fuse/file_system.go:198 +0x7c
github.com/superfly/litefs.(*DB).setPos(0xc0001bd040, {0xa85512?, 0xc000030980?})
	/src/litefs/db.go:154 +0x48
github.com/superfly/litefs.(*DB).CommitJournal(0xc0001bd040, {0xa85512, 0x6})
	/src/litefs/db.go:1027 +0x16c8
github.com/superfly/litefs/fuse.(*RootNode).Remove(0xc0002da3f0, {0xb69a20?, 0xc0002da3f0?}, 0xc0002d96c8?)
	/src/litefs/fuse/root_node.go:268 +0x193
bazil.org/fuse/fs.(*Server).handleRequest(0xc00027f7a0, {0xb6e9f0, 0xc000248e80}, {0xb69a20?, 0xc0002da3f0}, 0xc000088f40, {0xb6d850?, 0xc0001529b0?}, 0xc0001cdf08)
	/go/pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fs/serve.go:1161 +0xeab
bazil.org/fuse/fs.(*Server).serve(0xc00027f7a0, {0xb6d850, 0xc0001529b0})
	/go/pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fs/serve.go:1015 +0x625
bazil.org/fuse/fs.(*Server).Serve.func1()
	/go/pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fs/serve.go:512 +0x69
created by bazil.org/fuse/fs.(*Server).Serve
	/go/pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fs/serve.go:510 +0x3df

goroutine 361739 [runnable]:
net/http.(*connReader).startBackgroundRead.func2()
	/usr/local/go/src/net/http/server.go:674
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1594 +0x1
created by net/http.(*connReader).startBackgroundRead
	/usr/local/go/src/net/http/server.go:674 +0xca
goroutine 361738 [running]:
runtime/pprof.writeGoroutineStacks({0xb6a360, 0xc00027e000})
	/usr/local/go/src/runtime/pprof/pprof.go:692 +0x70
runtime/pprof.writeGoroutine({0xb6a360?, 0xc00027e000?}, 0x0?)
	/usr/local/go/src/runtime/pprof/pprof.go:681 +0x2b
runtime/pprof.(*Profile).WriteTo(0xa2dda0?, {0xb6a360?, 0xc00027e000?}, 0xc?)
	/usr/local/go/src/runtime/pprof/pprof.go:330 +0x14b
net/http/pprof.handler.ServeHTTP({0xc000262221, 0x9}, {0xb6e300, 0xc00027e000}, 0xc0002fc197?)
	/usr/local/go/src/net/http/pprof/pprof.go:253 +0x4a5
net/http/pprof.Index({0xb6e300?, 0xc00027e000}, 0xc000130100)
	/usr/local/go/src/net/http/pprof/pprof.go:371 +0x13e
github.com/superfly/litefs/http.(*Server).serveHTTP(0xc0001652c0?, {0xb6e300, 0xc00027e000}, 0xc000130100)
	/src/litefs/http/server.go:128 +0x186
net/http.HandlerFunc.ServeHTTP(0xc0001199b8?, {0xb6e300?, 0xc00027e000?}, 0xc000119990?)
	/usr/local/go/src/net/http/server.go:2109 +0x2f
golang.org/x/net/http2/h2c.h2cHandler.ServeHTTP({{0xb6aec0?, 0xc000150a80?}, 0xc000089bc0?}, {0xb6e300, 0xc00027e000}, 0xc000130100)
	/go/pkg/mod/golang.org/x/net@v0.0.0-20220909164309-bea034e7d591/http2/h2c/h2c.go:113 +0x49f
net/http.serverHandler.ServeHTTP({0xc0001a7110?}, {0xb6e300, 0xc00027e000}, 0xc000130100)
	/usr/local/go/src/net/http/server.go:2947 +0x30c
net/http.(*conn).serve(0xc0001643c0, {0xb6ea98, 0xc000342030})
	/usr/local/go/src/net/http/server.go:1991 +0x607
created by net/http.(*Server).Serve
	/usr/local/go/src/net/http/server.go:3102 +0x4db

goroutine 1 [select, 182 minutes]:
main.runMount({0xb6ea28, 0xc00002c080}, {0xc0000220c0, 0x3, 0x3})
	/src/litefs/cmd/litefs/main.go:103 +0x558
main.run({0xb6ea28?, 0xc00002c080?}, {0xc0000220b0?, 0xc000119f70?, 0x406719?})
	/src/litefs/cmd/litefs/main.go:47 +0x78
main.main()
	/src/litefs/cmd/litefs/main.go:30 +0x71

goroutine 23 [syscall, 182 minutes]:
os/signal.signal_recv()
	/usr/local/go/src/runtime/sigqueue.go:152 +0x2f
os/signal.loop()
	/usr/local/go/src/os/signal/signal_unix.go:23 +0x19
created by os/signal.Notify.func1.1
	/usr/local/go/src/os/signal/signal.go:151 +0x2a

goroutine 38 [select]:
github.com/superfly/litefs.(*Store).monitorLeaseAsPrimary(0xc000130500, {0xb6e9f0, 0xc000089b00}, {0xb6ed00, 0xc0003424b0})
	/src/litefs/store.go:526 +0x1f0
github.com/superfly/litefs.(*Store).monitorLease(0xc000130500, {0xb6e9f0, 0xc000089b00})
	/src/litefs/store.go:451 +0x1fd
github.com/superfly/litefs.(*Store).Open.func1()
	/src/litefs/store.go:134 +0x25
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/go/pkg/mod/golang.org/x/sync@v0.0.0-20220601150217-0de741cfad7f/errgroup/errgroup.go:75 +0x64
created by golang.org/x/sync/errgroup.(*Group).Go
	/go/pkg/mod/golang.org/x/sync@v0.0.0-20220601150217-0de741cfad7f/errgroup/errgroup.go:72 +0xa5

goroutine 39 [select, 1 minutes]:
github.com/superfly/litefs.(*Store).monitorRetention(0xc000130500, {0xb6e9f0, 0xc000089b00})
	/src/litefs/store.go:613 +0xe7
github.com/superfly/litefs.(*Store).Open.func2()
	/src/litefs/store.go:138 +0x25
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/go/pkg/mod/golang.org/x/sync@v0.0.0-20220601150217-0de741cfad7f/errgroup/errgroup.go:75 +0x64
created by golang.org/x/sync/errgroup.(*Group).Go
	/go/pkg/mod/golang.org/x/sync@v0.0.0-20220601150217-0de741cfad7f/errgroup/errgroup.go:72 +0xa5

goroutine 36 [IO wait]:
internal/poll.runtime_pollWait(0x7fd5be2ae000, 0x72)
	/usr/local/go/src/runtime/netpoll.go:305 +0x89
internal/poll.(*pollDesc).wait(0xc00016ac00?, 0xc0000bb300?, 0x0)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x32
internal/poll.(*pollDesc).waitRead(...)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc00016ac00, {0xc0000bb300, 0x1300, 0x1300})
	/usr/local/go/src/internal/poll/fd_unix.go:167 +0x25a
net.(*netFD).Read(0xc00016ac00, {0xc0000bb300?, 0xc00006b560?, 0xc0000bb305?})
	/usr/local/go/src/net/fd_posix.go:55 +0x29
net.(*conn).Read(0xc0000127b8, {0xc0000bb300?, 0xc0001c3810?, 0x417ab3?})
	/usr/local/go/src/net/net.go:183 +0x45
crypto/tls.(*atLeastReader).Read(0xc000200798, {0xc0000bb300?, 0x0?, 0xf0?})
	/usr/local/go/src/crypto/tls/conn.go:787 +0x3d
bytes.(*Buffer).ReadFrom(0xc00014de78, {0xb696c0, 0xc000200798})
	/usr/local/go/src/bytes/buffer.go:202 +0x98
crypto/tls.(*Conn).readFromUntil(0xc00014dc00, {0xb6a0a0?, 0xc0000127b8}, 0x203000?)
	/usr/local/go/src/crypto/tls/conn.go:809 +0xe5
crypto/tls.(*Conn).readRecordOrCCS(0xc00014dc00, 0x0)
	/usr/local/go/src/crypto/tls/conn.go:616 +0x116
crypto/tls.(*Conn).readRecord(...)
	/usr/local/go/src/crypto/tls/conn.go:582
crypto/tls.(*Conn).Read(0xc00014dc00, {0xc00019c000, 0x1000, 0x771860?})
	/usr/local/go/src/crypto/tls/conn.go:1287 +0x16f
bufio.(*Reader).Read(0xc0000d86c0, {0xc00027f700, 0x9, 0x78e765?})
	/usr/local/go/src/bufio/bufio.go:237 +0x1bb
io.ReadAtLeast({0xb69540, 0xc0000d86c0}, {0xc00027f700, 0x9, 0x9}, 0x9)
	/usr/local/go/src/io/io.go:332 +0x9a
io.ReadFull(...)
	/usr/local/go/src/io/io.go:351
net/http.http2readFrameHeader({0xc00027f700?, 0x9?, 0xc0003433e0?}, {0xb69540?, 0xc0000d86c0?})
	/usr/local/go/src/net/http/h2_bundle.go:1565 +0x6e
net/http.(*http2Framer).ReadFrame(0xc00027f6c0)
	/usr/local/go/src/net/http/h2_bundle.go:1829 +0x95
net/http.(*http2clientConnReadLoop).run(0xc0001c3f98)
	/usr/local/go/src/net/http/h2_bundle.go:8875 +0x130
net/http.(*http2ClientConn).readLoop(0xc0000aa180)
	/usr/local/go/src/net/http/h2_bundle.go:8771 +0x6f
created by net/http.(*http2Transport).newClientConn
	/usr/local/go/src/net/http/h2_bundle.go:7478 +0xaaa

goroutine 48 [select, 182 minutes]:
os/exec.(*Cmd).watchCtx.func1()
	/usr/local/go/src/os/exec/exec.go:648 +0x96
created by os/exec.(*Cmd).watchCtx
	/usr/local/go/src/os/exec/exec.go:647 +0x8e

goroutine 44 [syscall, 3 minutes]:
syscall.Syscall(0x4780ee?, 0xf08980?, 0x40dc07?, 0x9f8a60?)
	/usr/local/go/src/syscall/syscall_linux.go:68 +0x27
syscall.read(0xf08980?, {0xc000280000?, 0x11?, 0x0?})
	/usr/local/go/src/syscall/zsyscall_linux_amd64.go:696 +0x45
syscall.Read(...)
	/usr/local/go/src/syscall/syscall_unix.go:183
bazil.org/fuse.(*Conn).ReadRequest(0xc0000d8de0)
	/go/pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fuse.go:580 +0xc6
bazil.org/fuse/fs.(*Server).Serve(0xc00027f7a0, {0xb699a0?, 0xc0000d8ae0})
	/go/pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fs/serve.go:501 +0x3ef
github.com/superfly/litefs/fuse.(*FileSystem).Mount.func2()
	/src/litefs/fuse/file_system.go:93 +0x28
created by github.com/superfly/litefs/fuse.(*FileSystem).Mount
	/src/litefs/fuse/file_system.go:92 +0x31c

goroutine 45 [IO wait, 1 minutes]:
internal/poll.runtime_pollWait(0x7fd5be2ae0f0, 0x72)
	/usr/local/go/src/runtime/netpoll.go:305 +0x89
internal/poll.(*pollDesc).wait(0xc00016a600?, 0x6?, 0x0)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x32
internal/poll.(*pollDesc).waitRead(...)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc00016a600)
	/usr/local/go/src/internal/poll/fd_unix.go:614 +0x234
net.(*netFD).accept(0xc00016a600)
	/usr/local/go/src/net/fd_unix.go:172 +0x35
net.(*TCPListener).accept(0xc0000115f0)
	/usr/local/go/src/net/tcpsock_posix.go:142 +0x28
net.(*TCPListener).Accept(0xc0000115f0)
	/usr/local/go/src/net/tcpsock.go:288 +0x3d
net/http.(*Server).Serve(0xc0000fa1e0, {0xb6e0f0, 0xc0000115f0})
	/usr/local/go/src/net/http/server.go:3070 +0x385
github.com/superfly/litefs/http.(*Server).Serve.func1()
	/src/litefs/http/server.go:74 +0x2d
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/go/pkg/mod/golang.org/x/sync@v0.0.0-20220601150217-0de741cfad7f/errgroup/errgroup.go:75 +0x64
created by golang.org/x/sync/errgroup.(*Group).Go
	/go/pkg/mod/golang.org/x/sync@v0.0.0-20220601150217-0de741cfad7f/errgroup/errgroup.go:72 +0xa5

goroutine 49 [syscall, 182 minutes]:
syscall.Syscall6(0x0?, 0x0?, 0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/usr/local/go/src/syscall/syscall_linux.go:90 +0x36
os.(*Process).blockUntilWaitable(0xc000262180)
	/usr/local/go/src/os/wait_waitid.go:32 +0x87
os.(*Process).wait(0xc000262180)
	/usr/local/go/src/os/exec_unix.go:22 +0x28
os.(*Process).Wait(...)
	/usr/local/go/src/os/exec.go:132
os/exec.(*Cmd).Wait(0xc0001914a0)
	/usr/local/go/src/os/exec/exec.go:599 +0x4b
main.(*MountCommand).execCmd.func1()
	/src/litefs/cmd/litefs/mount_linux.go:332 +0x25
created by main.(*MountCommand).execCmd
	/src/litefs/cmd/litefs/mount_linux.go:332 +0x311

goroutine 361712 [semacquire, 3 minutes]:
sync.runtime_SemacquireMutex(0xc00047bec0?, 0x80?, 0x78?)
	/usr/local/go/src/runtime/sema.go:77 +0x25
sync.(*Mutex).lockSlow(0xc0001bd040)
	/usr/local/go/src/sync/mutex.go:171 +0x165
sync.(*Mutex).Lock(...)
	/usr/local/go/src/sync/mutex.go:90
github.com/superfly/litefs.(*DB).Pos(0x18?)
	/src/litefs/db.go:143 +0x51
github.com/superfly/litefs/fuse.(*PosNode).Read(0xc00027f7a0?, {0xa39ce0?, 0xc000332600?}, 0xc000128310, 0xc00025ce58)
	/src/litefs/fuse/pos_node.go:50 +0x2e
bazil.org/fuse/fs.(*Server).handleRequest(0xc00027f7a0, {0xb6e9f0, 0xc000249000}, {0xb699e0?, 0xc000332600}, 0xc000089440, {0xb6d7c0?, 0xc000128310?}, 0xc00019af08)
	/go/pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fs/serve.go:1418 +0x1f5d
bazil.org/fuse/fs.(*Server).serve(0xc00027f7a0, {0xb6d7c0, 0xc000128310})
	/go/pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fs/serve.go:1015 +0x625
bazil.org/fuse/fs.(*Server).Serve.func1()
	/go/pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fs/serve.go:512 +0x69
created by bazil.org/fuse/fs.(*Server).Serve
	/go/pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fs/serve.go:510 +0x3df

goroutine 361709 [syscall, 3 minutes]:
syscall.Syscall(0x46bb2e?, 0x41b505?, 0xc0001cd348?, 0x0?)
	/usr/local/go/src/syscall/syscall_linux.go:68 +0x27
syscall.write(0x40d8bf?, {0xc000262240?, 0x0?, 0x8?})
	/usr/local/go/src/syscall/zsyscall_linux_amd64.go:924 +0x45
syscall.Write(...)
	/usr/local/go/src/syscall/syscall_unix.go:211
bazil.org/fuse.(*Conn).writeToKernel(0xc0001cd418?, {0xc000262240, 0x28, 0x28})
	/go/pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fuse.go:1165 +0xea
bazil.org/fuse.(*Conn).sendNotify(...)
	/go/pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fuse.go:1208
bazil.org/fuse.(*Conn).InvalidateNode(0x9cfd00?, 0x3, 0x0, 0xffffffffffffffff)
	/go/pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fuse.go:1233 +0x131
bazil.org/fuse/fs.(*Server).invalidateNode(0xc00027f7a0, {0xb699e0?, 0xc000332600?}, 0x0, 0xffffffffffffffff)
	/go/pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fs/serve.go:1729 +0x231
bazil.org/fuse/fs.(*Server).InvalidateNodeData(...)
	/go/pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fs/serve.go:1757
github.com/superfly/litefs/fuse.(*FileSystem).InvalidatePos(0xc0000d8ae0, 0xc000012320?)
	/src/litefs/fuse/file_system.go:198 +0x7c
github.com/superfly/litefs.(*DB).setPos(0xc0001bd040, {0xa85512?, 0xc000030980?})
	/src/litefs/db.go:154 +0x48
github.com/superfly/litefs.(*DB).CommitJournal(0xc0001bd040, {0xa85512, 0x6})
	/src/litefs/db.go:1027 +0x16c8
github.com/superfly/litefs/fuse.(*RootNode).Remove(0xc0002da3f0, {0xb69a20?, 0xc0002da3f0?}, 0xc0002d96c8?)
	/src/litefs/fuse/root_node.go:268 +0x193
bazil.org/fuse/fs.(*Server).handleRequest(0xc00027f7a0, {0xb6e9f0, 0xc000248e80}, {0xb69a20?, 0xc0002da3f0}, 0xc000088f40, {0xb6d850?, 0xc0001529b0?}, 0xc0001cdf08)
	/go/pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fs/serve.go:1161 +0xeab
bazil.org/fuse/fs.(*Server).serve(0xc00027f7a0, {0xb6d850, 0xc0001529b0})
	/go/pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fs/serve.go:1015 +0x625
bazil.org/fuse/fs.(*Server).Serve.func1()
	/go/pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fs/serve.go:512 +0x69
created by bazil.org/fuse/fs.(*Server).Serve
	/go/pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fs/serve.go:510 +0x3df

goroutine 361762 [runnable]:
net/http.(*connReader).startBackgroundRead.func2()
	/usr/local/go/src/net/http/server.go:674
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1594 +0x1
created by net/http.(*connReader).startBackgroundRead
	/usr/local/go/src/net/http/server.go:674 +0xca

I don’t know how to read these so I don’t know if they’re helpful.

1 Like

I also managed to SSH into the VM when it was experiencing this spike and ran top. Here are a couple snapshots of that output:

top - 16:51:03 up  3:01,  0 users,  load average: 0.95, 0.56, 0.31
Tasks:  53 total,   1 running,  52 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,  0.0 id, 99.7 wa,  0.0 hi,  0.0 si,  0.3 st
MiB Mem :   1982.5 total,   1095.4 free,    414.2 used,    472.9 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.   1430.5 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                            
    1 root      20   0    6496   3400   2964 S   0.0   0.2   0:03.48 init                                                                                                               
    2 root      20   0       0      0      0 S   0.0   0.0   0:00.00 kthreadd                                                                                                           
    3 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_gp                                                                                                             
    4 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_par_gp                                                                                                         
    5 root      20   0       0      0      0 I   0.0   0.0   0:00.00 kworker/0:0-events                                                                                                 
    6 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/0:0H-events_highpri                                                                                        
    7 root      20   0       0      0      0 I   0.0   0.0   0:01.70 kworker/0:1-events_power_efficient                                                                                 
    8 root      20   0       0      0      0 I   0.0   0.0   0:00.05 kworker/u2:0-flush-254:32                                                                                          
    9 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 mm_percpu_wq                                                                                                       
   10 root      20   0       0      0      0 S   0.0   0.0   0:00.00 rcu_tasks_rude_                                                                                                    
   11 root      20   0       0      0      0 S   0.0   0.0   0:00.00 rcu_tasks_trace                                                                                                    
   12 root      20   0       0      0      0 S   0.0   0.0   0:00.73 ksoftirqd/0                                                                                                        
   13 root      20   0       0      0      0 I   0.0   0.0   0:02.17 rcu_sched                                                                                                          
   14 root      rt   0       0      0      0 S   0.0   0.0   0:00.00 migration/0                                                                                                        
   15 root      20   0       0      0      0 S   0.0   0.0   0:00.00 cpuhp/0                                                                                                            
   17 root      20   0       0      0      0 S   0.0   0.0   0:00.00 kdevtmpfs                                                                                                          
   18 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 netns                                                                                                              
   19 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 inet_frag_wq                                                                                                       
   20 root      20   0       0      0      0 S   0.0   0.0   0:00.00 kauditd                                                                                                            
   24 root      20   0       0      0      0 S   0.0   0.0   0:00.00 oom_reaper                                                                                                         
   25 root      20   0       0      0      0 I   0.0   0.0   0:00.05 kworker/u2:2-events_unbound                                                                                        
  211 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 writeback                                                                                                          
  212 root      20   0       0      0      0 S   0.0   0.0   0:00.32 kcompactd0                                                                                                         
  215 root      25   5       0      0      0 S   0.0   0.0   0:00.00 ksmd                                                                                                               
  216 root      39  19       0      0      0 S   0.0   0.0   0:00.02 khugepaged                  
top - 16:51:06 up  3:01,  0 users,  load average: 0.95, 0.56, 0.31
Tasks:  53 total,   1 running,  52 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.3 us,  0.0 sy,  0.0 ni,  0.0 id, 99.3 wa,  0.0 hi,  0.0 si,  0.3 st
MiB Mem :   1982.5 total,   1095.4 free,    414.2 used,    472.9 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.   1430.5 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                            
    1 root      20   0    6496   3400   2964 S   0.3   0.2   0:03.49 init                                                                                                               
  530 root      20   0  705916   4956   3500 S   0.3   0.2   0:07.97 hallpass                                                                                                           
    2 root      20   0       0      0      0 S   0.0   0.0   0:00.00 kthreadd                                                                                                           
    3 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_gp                                                                                                             
    4 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_par_gp                                                                                                         
    5 root      20   0       0      0      0 I   0.0   0.0   0:00.00 kworker/0:0-events                                                                                                 
    6 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/0:0H-events_highpri                                                                                        
    7 root      20   0       0      0      0 I   0.0   0.0   0:01.70 kworker/0:1-events_power_efficient                                                                                 
    8 root      20   0       0      0      0 I   0.0   0.0   0:00.05 kworker/u2:0-events_unbound                                                                                        
    9 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 mm_percpu_wq                                                                                                       
   10 root      20   0       0      0      0 S   0.0   0.0   0:00.00 rcu_tasks_rude_                                                                                                    
   11 root      20   0       0      0      0 S   0.0   0.0   0:00.00 rcu_tasks_trace                                                                                                    
   12 root      20   0       0      0      0 S   0.0   0.0   0:00.73 ksoftirqd/0                                                                                                        
   13 root      20   0       0      0      0 I   0.0   0.0   0:02.17 rcu_sched                                                                                                          
   14 root      rt   0       0      0      0 S   0.0   0.0   0:00.00 migration/0                                                                                                        
   15 root      20   0       0      0      0 S   0.0   0.0   0:00.00 cpuhp/0                                                                                                            
   17 root      20   0       0      0      0 S   0.0   0.0   0:00.00 kdevtmpfs                                                                                                          
   18 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 netns                                                                                                              
   19 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 inet_frag_wq                                                                                                       
   20 root      20   0       0      0      0 S   0.0   0.0   0:00.00 kauditd                                                                                                            
   24 root      20   0       0      0      0 S   0.0   0.0   0:00.00 oom_reaper                                                                                                         
   25 root      20   0       0      0      0 I   0.0   0.0   0:00.05 kworker/u2:2-events_unbound                                                                                        
  211 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 writeback                                                                                                          
  212 root      20   0       0      0      0 S   0.0   0.0   0:00.32 kcompactd0                                                                                                         
  215 root      25   5       0      0      0 S   0.0   0.0   0:00.00 ksmd                                                                                                               
  216 root      39  19       0      0      0 S   0.0   0.0   0:00.02 khugepaged                                                                                                         
  267 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kintegrityd                  
top - 16:53:20 up  3:03,  0 users,  load average: 1.00, 0.72, 0.41
Tasks:  55 total,   2 running,  53 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,  0.0 id,100.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   1982.5 total,   1092.5 free,    417.0 used,    473.0 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.   1427.6 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                            
  713 root      20   0    7056   3152   2664 R   0.3   0.2   0:00.02 top                                                                                                                
    1 root      20   0    6496   3400   2964 S   0.0   0.2   0:03.52 init                                                                                                               
    2 root      20   0       0      0      0 S   0.0   0.0   0:00.00 kthreadd                                                                                                           
    3 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_gp                                                                                                             
    4 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_par_gp                                                                                                         
    5 root      20   0       0      0      0 I   0.0   0.0   0:00.00 kworker/0:0-events                                                                                                 
    6 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/0:0H-events_highpri                                                                                        
    7 root      20   0       0      0      0 I   0.0   0.0   0:01.71 kworker/0:1-events_power_efficient                                                                                 
    8 root      20   0       0      0      0 I   0.0   0.0   0:00.06 kworker/u2:0-events_unbound                                                                                        
    9 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 mm_percpu_wq                                                                                                       
   10 root      20   0       0      0      0 S   0.0   0.0   0:00.00 rcu_tasks_rude_                                                                                                    
   11 root      20   0       0      0      0 S   0.0   0.0   0:00.00 rcu_tasks_trace                                                                                                    
   12 root      20   0       0      0      0 S   0.0   0.0   0:00.73 ksoftirqd/0                                                                                                        
   13 root      20   0       0      0      0 I   0.0   0.0   0:02.18 rcu_sched                                                                                                          
   14 root      rt   0       0      0      0 S   0.0   0.0   0:00.00 migration/0                                                                                                        
   15 root      20   0       0      0      0 S   0.0   0.0   0:00.00 cpuhp/0                                                                                                            
   17 root      20   0       0      0      0 S   0.0   0.0   0:00.00 kdevtmpfs             

I’m uncertain how to interpret that as well. It looks like none of the processes are using much memory or CPU percentage, but the command init seems to be taking a lot of VIRT, RES, and SHR whatever that means :man_shrugging:

I’m going to remove LiteFS by pointing my database connection directly to the /data/dbs/sqlite.db/database file that LiteFS has been managing. This should mean minimal downtime for now and when I want to bring LiteFS back, I’ll have to do the migration stuff manually, but at least now I don’t have to worry about SQLite dumps and a bunch of downtime.

1 Like

Yeah, they’re pretty hard to read in general. On the plus side, I think it gives me what I need. From the stack traces, it looks like there’s a deadlock between reading the -pos file and the journal notifying the FUSE file system that the position file has changed after a commit.

I’ll get that fixed up. Thanks for your patience!

4 Likes

@kentcdodds I merged in a PR to fix the issue. There was an old lock on the litefs.DB that was redundant since SQLite uses its own set of locks. The position itself is now wrapped in a sync/atomic primitive so it can be accessed safely outside the SQLite locks too. I tagged you on the PR but I’m posting here for posterity.

Let me know if you continue having issues. Thanks again! :pray:

3 Likes

Sweet. I just upgraded and migrated my data. I think we’re in a good place. I’ll give it a day or two before I say we’re solid. At that point, I’ll try multi-region again :slight_smile:

1 Like