Got a report from a user that they couldn’t access my site, and I checked the database to find that the last log was “Server bk_db/pg1 is going DOWN for maintenance (unspecified DNS error). 0 active and 1 backup servers left. Running on backup. 1 sessions active, 0 requeued, 0 remaining in queue.”
Some googling seems to indicate that this was a hardware failure of the underlying machine?
Fly documentation is not great. Tried several times with fly pg restart and nothing, just kept saying “no active leader found”. Did some more googling, found fly machines restart and that did the trick. Thanks for your help!
On the off chance you or someone reading this works for fly, the full sequence was: fly restart -a fencing-database-db fly restart fencing-database-db fly apps restart fencing-database-db fly pg restart -a fencing-database-db
I ran each of those in order. I found fly restart in the documentation somewhere, and each one in turn told me to run the next, instead of sending me directly to the last. Very very bad UX. And, of course, nowhere was fly machines restart mentioned.