Postgres cluster is unstable

dedsec · October 30, 2021, 2:26pm

So according to the fly checks our pg cluster app is phasing int and out of a critical status by failing the vm check

In normal circumstances this would be just fine but every time it becomes critical we loose all our LISTEN/NOTIFY connections and they cannot be renewed automatically unless we restart our connected applications.

This always happens on weekends and we suspect consul is at fault cause that’s the explanation we got last time.

For example there is not a 1 minute difference between these 3 notifications.

Can you check it please ?

kurt · October 30, 2021, 2:42pm

Those check failures aren’t related to consul, those are checks indicating that the VM is CPU constrained. The VMs are probably restarting when that check fails multiple times in a row.

You can change this behavior.

fly config save -a <database-app-name>
For health checks you want to just let fail repeatedly, set this: restart_limit = 0
fly deploy -i flyio/postgres:12.8

This should prevent VMs from restarting when checks fail.

Topic		Replies	Views
PosgreSQL on Fly: 1 critical health check	10	616	December 20, 2021
Postgres clusters periodically down across many of our organizations Questions / Help postgres	7	1641	October 13, 2022
Successful deploy VM keeps restarting Questions / Help troubleshooting	4	427	March 23, 2023
Long Postgres outage (consul i/o timeout) Questions / Help postgres , consul	3	839	February 11, 2023
Postgres health checks perpetually failing Questions / Help postgres	3	1059	March 2, 2023

Postgres cluster is unstable

Related topics