@elliotdickison Were your health checks failing? When your cluster enters read-only mode, it should provide some context within the health checks
Update, somehow missed one of your responses.
In the same timeframe we disabled barman and enabled the new PITR. Something went wrong with this, causing disk space on our primary to steadily fill up over a period of a couple days.
The separate Barman machine does use a replication slot, so if this was not cleaned up properly when it was removed it could lead to issues. You should receive warnings about inactive replication slots within your logs. Inactive replication slots should also be removed automatically after 12 hours.
Setting aside the root cause (how did reindexing a small, super low traffic database get us from ~15% utilization of a 10GB volume to > 90%!?), we’re still not sure how to get the db back into a writeable state. Extending volumes per the log message didn’t help.
How did you run the reindex command? Was it run with the CONCURRENTLY
option?
If not, the exclusive lock taken out on the table could certainly impact replication and delay WAL generation.