Postgres leader check failure

enaia · November 9, 2021, 11:44pm

Is there an outage currently? My database cluster died.

$ flyctl --app production-db checks list
Health Checks for production-db
NAME STATUS ALLOCATION REGION TYPE LAST UPDATED OUTPUT
role critical e0f54567 iad HTTP 8s ago context deadline exceeded
pg critical e0f54567 iad HTTP 14s ago HTTP GET
http://172.19.4.170:5500/flycheck/pg:
500 Internal Server Error
Output: “[✗] leader check:
lookup production-db.internal on
[fdaa::3]:53: no such host”
vm warning e0f54567 iad HTTP 18s ago

Logs here

Update: it’s back up now

kurt · November 10, 2021, 12:11am

There isn’t an outage that we are aware of. If you run fly status --all do you see any failed instances? And how long have the ones that are there been running?

kurt · November 10, 2021, 12:12am

Also will you run fly image version and see if it recommends an upgrade?

shaun · November 10, 2021, 12:13am

Hey @enaia

I was able to verify that the Etcd backend store encountered some issues around that time. I’m still investigating the root cause, but things seem to be stable as of right now.

enaia · November 10, 2021, 12:14am

$ fly status --app production-db --all
App
Name = production-db
Owner = enaia
Version = 4
Status = running
Hostname = production-db.fly.dev

Instances
ID PROCESS VERSION REGION DESIRED STATUS HEALTH CHECKS RESTARTS CREATED
fbcf2c92 app 4 iad run running (replica) 3 total, 3 passing 0 29m48s ago
e0f54567 app 4 iad run running (leader) 3 total, 3 passing 0 33m44s ago
c8d852bd app 4 iad stop failed 3 total 2 2021-11-07T19:56:01Z
657286da app 4 iad stop failed 3 total, 2 critical 2 2021-11-07T19:54:02Z

Not sure how to run fly image version?

shaun · November 10, 2021, 1:35am

@enaia I think he mean’t fly image show.

Topic		Replies	Views
PosgreSQL on Fly: 1 critical health check	10	620	December 20, 2021
Fly Postgres machine crashed, won't start or stop postgres	8	67	February 10, 2025
Both postgres instances are replica	1	279	March 5, 2022
anatomy of a postgres outage Questions / Help postgres	5	1003	November 8, 2022
Pg cluster and redis not alive Questions / Help	1	295	October 13, 2021

Postgres leader check failure

Related topics