Postgres database apps are crashing again

julianrubisch · October 25, 2022, 7:41am

Not sure if related at all, I‘m experiencing an outage today.

Here’s a log snippet from the primary (?)

[info] keeper | 2022-10-24T19:28:23.791Z ERROR cmd/keeper.go:870 failed to update keeper info {"error": "Unexpected response code: 500 (rpc error making call: leadership lost while committing log)"}
[info] sentinel | 2022-10-24T19:28:23.891Z ERROR cmd/sentinel.go:1843 error retrieving cluster data {"error": "Unexpected response code: 500"}

I’ve also noticed an immensely large replication lag in the metrics. All other metrics in Grafana (memory, cpu load, volume) seem okay though.