Postgres down. Again

I’m experiencing lot’s of db connection issues and app is down for about 15 minutes the second time in the last few days. This becomes very concerning. This is a 3 node cluster

db log is full of

WARN	cmd/sentinel.go:276	no keeper info available

LOG:  terminating walsender process due to replication timeout

LOG:  unexpected EOF on client connection with an open transaction

error retrieving cluster data	{"error": "Get \"https://consul-fra.fly-shared.net..."}

backend 'bk_db' has no server available!
db/pg1 was DOWN and now enters maintenance (DNS timeout status).
Server bk_db/pg2 was DOWN and now enters maintenance (DNS timeout status).
Server bk_db/pg3 was DOWN and now enters maintenance (DNS timeout status).

error retrieving cluster data
failed to update keeper info
election loop error

FATAL:  could not connect to the primary server

Server bk_db/pg1 is DOWN, reason: Layer7 invalid response, info: "HTTP content check did not match"

3 Likes

Having the same issue here

Db connection error stats:

image

1 Like