Ok it does look like one of the machines wasn’t replicating properly for quite a while. The most recent updated at in the good snapshot is 2021-12-01 21:23:34
, the most recent updated at in the other is 2021-11-30 17:32:57
. So it seems like what was primary failed sometime today, and the out of date replica took over.
We’ve changed your database to not restart when health checks fail. This might mean it stops responding, but it should make it obvious what’s wrecking the VMs.