Election loop and RPC errors in Postgres HA

shugel · December 1, 2022, 2:13pm

I’m seeing election loop and keeper errors in my HA-and-multi-region Postgres setup:

2022-12-01T13:23:07.657 app[8a195f07] lhr [info] sentinel | 2022-12-01T13:23:07.656Z ERROR cmd/sentinel.go:1852 error retrieving cluster data {"error": "Unexpected response code: 500"}

2022-12-01T13:23:07.718 app[23ec7e6d] lhr [info] keeper | 2022-12-01T13:23:07.716Z ERROR cmd/keeper.go:1041 error retrieving cluster data {"error": "Unexpected response code: 500"}

2022-12-01T13:23:07.724 app[4708975a] lhr [info] sentinel | 2022-12-01T13:23:07.723Z ERROR cmd/sentinel.go:1852 error retrieving cluster data {"error": "Unexpected response code: 500"}

2022-12-01T13:23:07.726 app[23ec7e6d] lhr [info] sentinel | 2022-12-01T13:23:07.726Z ERROR cmd/sentinel.go:1947 error saving clusterdata {"error": "Unexpected response code: 500"}

2022-12-01T13:23:07.765 app[8a195f07] lhr [info] keeper | 2022-12-01T13:23:07.765Z ERROR cmd/keeper.go:1041 error retrieving cluster data {"error": "Unexpected response code: 500"}

2022-12-01T13:23:07.909 app[a3c1b79a] ams [info] keeper | 2022-12-01T13:23:07.908Z ERROR cmd/keeper.go:870 failed to update keeper info {"error": "Unexpected response code: 500 (rpc error making call: leadership lost while committing log)"}

and

2022-12-01T13:39:15.432 app[a3c1b79a] ams [info] sentinel | 2022-12-01T13:39:15.432Z ERROR cmd/sentinel.go:102 election loop error {"error": "Unexpected response code: 500 (No cluster leader)"}
2022-12-01T13:39:15.981 app[8a195f07] lhr [info] sentinel | 2022-12-01T13:39:15.981Z ERROR cmd/sentinel.go:102 election loop error {"error": "failed to read lock: Unexpected response code: 500"}
2022-12-01T13:39:16.488 app[4708975a] lhr [info] keeper | 2022-12-01T13:39:16.487Z ERROR cmd/keeper.go:1041 error retrieving cluster data {"error": "Unexpected response code: 500"}
2022-12-01T13:39:16.691 app[23ec7e6d] lhr [info] sentinel | 2022-12-01T13:39:16.690Z ERROR cmd/sentinel.go:1852 error retrieving cluster data {"error": "Unexpected response code: 500"}
2022-12-01T13:43:39.096 app[4708975a] lhr [info] sentinel | 2022-12-01T13:43:39.096Z WARN cmd/sentinel.go:276 no keeper info available {"db": "d71cefa2", "keeper": "23c5227db2"}
2022-12-01T13:48:21.055 app[4708975a] lhr [info] sentinel | 2022-12-01T13:48:21.054Z WARN cmd/sentinel.go:276 no keeper info available {"db": "03df4ba3", "keeper": "be6522f372"}
2022-12-01T13:48:21.055 app[4708975a] lhr [info] sentinel | 2022-12-01T13:48:21.054Z WARN cmd/sentinel.go:276 no keeper info available {"db": "54ecf943", "keeper": "23c222f532"}
2022-12-01T13:48:21.055 app[4708975a] lhr [info] sentinel | 2022-12-01T13:48:21.054Z WARN cmd/sentinel.go:276 no keeper info available {"db": "d71cefa2", "keeper": "23c5227db2"}
2022-12-01T14:00:38.255 app[4708975a] lhr [info] sentinel | 2022-12-01T14:00:38.253Z WARN cmd/sentinel.go:276 no keeper info available {"db": "03df4ba3", "keeper": "be6522f372"}
2022-12-01T14:01:40.735 app[4708975a] lhr [info] sentinel | 2022-12-01T14:01:40.734Z WARN cmd/sentinel.go:276 no keeper info available {"db": "4a112347", "keeper": "28df288fe2"}

I haven’t made any changes to my Postgres app for weeks, the disks are nowhere near full as far as I can tell (DB size is a little over 100mb), and the hosts are available and passing health checks in the dashboard.

jsierles · December 1, 2022, 3:52pm

Is this still happening?

shugel · December 1, 2022, 4:06pm

I’m currently only seeing the familiar “no keeper info available” warning every few minutes. No rpc or election loop errors in the last 90 mins or so.

shugel · December 2, 2022, 1:20pm

@jsierles Seeing some new errors now:

2022-12-02T12:30:30.104 app[a3c1b79a] ams [info] sentinel | panic: close of closed channel
2022-12-02T12:30:30.104 app[a3c1b79a] ams [info] sentinel |
2022-12-02T12:30:30.104 app[a3c1b79a] ams [info] sentinel | goroutine 91559 [running]:
2022-12-02T12:30:30.104 app[a3c1b79a] ams [info] sentinel | github.com/superfly/leadership.(*Candidate).initLock(0xc0000b0000)
2022-12-02T12:30:30.104 app[a3c1b79a] ams [info] sentinel | /go/pkg/mod/github.com/superfly/leadership@v0.2.1/candidate.go:98 +0x2e
2022-12-02T12:30:30.105 app[a3c1b79a] ams [info] sentinel | github.com/superfly/leadership.(*Candidate).campaign(0xc0000b0000)
2022-12-02T12:30:30.105 app[a3c1b79a] ams [info] sentinel | /go/pkg/mod/github.com/superfly/leadership@v0.2.1/candidate.go:124 +0xc6
2022-12-02T12:30:30.105 app[a3c1b79a] ams [info] sentinel | created by github.com/superfly/leadership.(*Candidate).RunForElection
2022-12-02T12:30:30.105 app[a3c1b79a] ams [info] sentinel | /go/pkg/mod/github.com/superfly/leadership@v0.2.1/candidate.go:60 +0xc5
2022-12-02T12:30:30.105 app[a3c1b79a] ams [info] sentinel | exit status 2
2022-12-02T12:30:30.105 app[a3c1b79a] ams [info] sentinel | restarting in 3s [attempt 10]
2022-12-02T12:30:31.878 app[8a195f07] lhr [info] sentinel | panic: close of closed channel
2022-12-02T12:30:31.878 app[8a195f07] lhr [info] sentinel |
2022-12-02T12:30:31.878 app[8a195f07] lhr [info] sentinel | goroutine 399818 [running]:
2022-12-02T12:30:31.878 app[8a195f07] lhr [info] sentinel | github.com/superfly/leadership.(*Candidate).initLock(0xc000138000)
2022-12-02T12:30:31.878 app[8a195f07] lhr [info] sentinel | /go/pkg/mod/github.com/superfly/leadership@v0.2.1/candidate.go:98 +0x2e
2022-12-02T12:30:31.878 app[8a195f07] lhr [info] sentinel | github.com/superfly/leadership.(*Candidate).campaign(0xc000138000)
2022-12-02T12:30:31.878 app[8a195f07] lhr [info] sentinel | /go/pkg/mod/github.com/superfly/leadership@v0.2.1/candidate.go:124 +0xc6
2022-12-02T12:30:31.878 app[8a195f07] lhr [info] sentinel | created by github.com/superfly/leadership.(*Candidate).RunForElection
2022-12-02T12:30:31.878 app[8a195f07] lhr [info] sentinel | /go/pkg/mod/github.com/superfly/leadership@v0.2.1/candidate.go:60 +0xc5
2022-12-02T12:30:31.880 app[8a195f07] lhr [info] sentinel | exit status 2
2022-12-02T12:30:31.880 app[8a195f07] lhr [info] sentinel | restarting in 3s [attempt 5]
2022-12-02T12:30:33.106 app[a3c1b79a] ams [info] sentinel | Running...
2022-12-02T12:30:34.881 app[8a195f07] lhr [info] sentinel | Running...
2022-12-02T13:01:34.985 app[23ec7e6d] lhr [info] sentinel | 2022-12-02T13:01:34.984Z WARN cmd/sentinel.go:276 no keeper info available {"db": "54ecf943", "keeper": "23c222f532"}

Topic		Replies	Views
Are these Postgres logs normal?	4	341	April 27, 2021
Postgres app stuck on the "un-mounting volume" stage postgres	6	257	October 27, 2022
Health check for your postgres database has failed	1	607	April 6, 2023
Postgres server down postgres	1	320	January 10, 2023
Postgres clusters periodically down across many of our organizations Questions / Help postgres	7	1651	October 13, 2022

Election loop and RPC errors in Postgres HA

Related topics