Election loop and RPC errors in Postgres HA

I’m seeing election loop and keeper errors in my HA-and-multi-region Postgres setup:

2022-12-01T13:23:07.657 app[8a195f07] lhr [info] sentinel | 2022-12-01T13:23:07.656Z ERROR cmd/sentinel.go:1852 error retrieving cluster data {"error": "Unexpected response code: 500"}

2022-12-01T13:23:07.718 app[23ec7e6d] lhr [info] keeper | 2022-12-01T13:23:07.716Z ERROR cmd/keeper.go:1041 error retrieving cluster data {"error": "Unexpected response code: 500"}

2022-12-01T13:23:07.724 app[4708975a] lhr [info] sentinel | 2022-12-01T13:23:07.723Z ERROR cmd/sentinel.go:1852 error retrieving cluster data {"error": "Unexpected response code: 500"}

2022-12-01T13:23:07.726 app[23ec7e6d] lhr [info] sentinel | 2022-12-01T13:23:07.726Z ERROR cmd/sentinel.go:1947 error saving clusterdata {"error": "Unexpected response code: 500"}

2022-12-01T13:23:07.765 app[8a195f07] lhr [info] keeper | 2022-12-01T13:23:07.765Z ERROR cmd/keeper.go:1041 error retrieving cluster data {"error": "Unexpected response code: 500"}

2022-12-01T13:23:07.909 app[a3c1b79a] ams [info] keeper | 2022-12-01T13:23:07.908Z ERROR cmd/keeper.go:870 failed to update keeper info {"error": "Unexpected response code: 500 (rpc error making call: leadership lost while committing log)"}

and

2022-12-01T13:39:15.432 app[a3c1b79a] ams [info] sentinel | 2022-12-01T13:39:15.432Z ERROR cmd/sentinel.go:102 election loop error {"error": "Unexpected response code: 500 (No cluster leader)"}
2022-12-01T13:39:15.981 app[8a195f07] lhr [info] sentinel | 2022-12-01T13:39:15.981Z ERROR cmd/sentinel.go:102 election loop error {"error": "failed to read lock: Unexpected response code: 500"}
2022-12-01T13:39:16.488 app[4708975a] lhr [info] keeper | 2022-12-01T13:39:16.487Z ERROR cmd/keeper.go:1041 error retrieving cluster data {"error": "Unexpected response code: 500"}
2022-12-01T13:39:16.691 app[23ec7e6d] lhr [info] sentinel | 2022-12-01T13:39:16.690Z ERROR cmd/sentinel.go:1852 error retrieving cluster data {"error": "Unexpected response code: 500"}
2022-12-01T13:43:39.096 app[4708975a] lhr [info] sentinel | 2022-12-01T13:43:39.096Z WARN cmd/sentinel.go:276 no keeper info available {"db": "d71cefa2", "keeper": "23c5227db2"}
2022-12-01T13:48:21.055 app[4708975a] lhr [info] sentinel | 2022-12-01T13:48:21.054Z WARN cmd/sentinel.go:276 no keeper info available {"db": "03df4ba3", "keeper": "be6522f372"}
2022-12-01T13:48:21.055 app[4708975a] lhr [info] sentinel | 2022-12-01T13:48:21.054Z WARN cmd/sentinel.go:276 no keeper info available {"db": "54ecf943", "keeper": "23c222f532"}
2022-12-01T13:48:21.055 app[4708975a] lhr [info] sentinel | 2022-12-01T13:48:21.054Z WARN cmd/sentinel.go:276 no keeper info available {"db": "d71cefa2", "keeper": "23c5227db2"}
2022-12-01T14:00:38.255 app[4708975a] lhr [info] sentinel | 2022-12-01T14:00:38.253Z WARN cmd/sentinel.go:276 no keeper info available {"db": "03df4ba3", "keeper": "be6522f372"}
2022-12-01T14:01:40.735 app[4708975a] lhr [info] sentinel | 2022-12-01T14:01:40.734Z WARN cmd/sentinel.go:276 no keeper info available {"db": "4a112347", "keeper": "28df288fe2"}

I haven’t made any changes to my Postgres app for weeks, the disks are nowhere near full as far as I can tell (DB size is a little over 100mb), and the hosts are available and passing health checks in the dashboard.

Is this still happening?

I’m currently only seeing the familiar “no keeper info available” warning every few minutes. No rpc or election loop errors in the last 90 mins or so.

@jsierles Seeing some new errors now:

2022-12-02T12:30:30.104 app[a3c1b79a] ams [info] sentinel | panic: close of closed channel
2022-12-02T12:30:30.104 app[a3c1b79a] ams [info] sentinel |
2022-12-02T12:30:30.104 app[a3c1b79a] ams [info] sentinel | goroutine 91559 [running]:
2022-12-02T12:30:30.104 app[a3c1b79a] ams [info] sentinel | github.com/superfly/leadership.(*Candidate).initLock(0xc0000b0000)
2022-12-02T12:30:30.104 app[a3c1b79a] ams [info] sentinel | /go/pkg/mod/github.com/superfly/leadership@v0.2.1/candidate.go:98 +0x2e
2022-12-02T12:30:30.105 app[a3c1b79a] ams [info] sentinel | github.com/superfly/leadership.(*Candidate).campaign(0xc0000b0000)
2022-12-02T12:30:30.105 app[a3c1b79a] ams [info] sentinel | /go/pkg/mod/github.com/superfly/leadership@v0.2.1/candidate.go:124 +0xc6
2022-12-02T12:30:30.105 app[a3c1b79a] ams [info] sentinel | created by github.com/superfly/leadership.(*Candidate).RunForElection
2022-12-02T12:30:30.105 app[a3c1b79a] ams [info] sentinel | /go/pkg/mod/github.com/superfly/leadership@v0.2.1/candidate.go:60 +0xc5
2022-12-02T12:30:30.105 app[a3c1b79a] ams [info] sentinel | exit status 2
2022-12-02T12:30:30.105 app[a3c1b79a] ams [info] sentinel | restarting in 3s [attempt 10]
2022-12-02T12:30:31.878 app[8a195f07] lhr [info] sentinel | panic: close of closed channel
2022-12-02T12:30:31.878 app[8a195f07] lhr [info] sentinel |
2022-12-02T12:30:31.878 app[8a195f07] lhr [info] sentinel | goroutine 399818 [running]:
2022-12-02T12:30:31.878 app[8a195f07] lhr [info] sentinel | github.com/superfly/leadership.(*Candidate).initLock(0xc000138000)
2022-12-02T12:30:31.878 app[8a195f07] lhr [info] sentinel | /go/pkg/mod/github.com/superfly/leadership@v0.2.1/candidate.go:98 +0x2e
2022-12-02T12:30:31.878 app[8a195f07] lhr [info] sentinel | github.com/superfly/leadership.(*Candidate).campaign(0xc000138000)
2022-12-02T12:30:31.878 app[8a195f07] lhr [info] sentinel | /go/pkg/mod/github.com/superfly/leadership@v0.2.1/candidate.go:124 +0xc6
2022-12-02T12:30:31.878 app[8a195f07] lhr [info] sentinel | created by github.com/superfly/leadership.(*Candidate).RunForElection
2022-12-02T12:30:31.878 app[8a195f07] lhr [info] sentinel | /go/pkg/mod/github.com/superfly/leadership@v0.2.1/candidate.go:60 +0xc5
2022-12-02T12:30:31.880 app[8a195f07] lhr [info] sentinel | exit status 2
2022-12-02T12:30:31.880 app[8a195f07] lhr [info] sentinel | restarting in 3s [attempt 5]
2022-12-02T12:30:33.106 app[a3c1b79a] ams [info] sentinel | Running...
2022-12-02T12:30:34.881 app[8a195f07] lhr [info] sentinel | Running...
2022-12-02T13:01:34.985 app[23ec7e6d] lhr [info] sentinel | 2022-12-02T13:01:34.984Z WARN cmd/sentinel.go:276 no keeper info available {"db": "54ecf943", "keeper": "23c222f532"}