Hi @nouzlin,
A single application-host server in ams
did experience a hardware failure, entering a degraded state (from 10:54-11:20 and 12:17-13:03- roughly matching the timestamps you mentioned), eventually requiring a manual hard-reboot of the server to fully recover. This was not a broader issue with the ams
region as a whole, just a server hosting one of the two instances comprising your postgres cluster.
I can’t speak to why the Postgres cluster-management setup didn’t trigger an automatic failover to the replica in your case. I do know that we’ve been actively working on a new iteration of Postgres clustering in an attempt to address some known reliability/stability issues with the current clustering setup.
@Elder, a separate host in ams
experienced a much shorter service interruption from 2023-02-10T11:07:00Z→2023-02-10T11:20:00Z, that probably affected a single instance in your cluster.