Unannounced Maintenance?

sdubinsky · February 19, 2023, 1:57pm

Got a report from a user that they couldn’t access my site, and I checked the database to find that the last log was “Server bk_db/pg1 is going DOWN for maintenance (unspecified DNS error). 0 active and 1 backup servers left. Running on backup. 1 sessions active, 0 requeued, 0 remaining in queue.”

Some googling seems to indicate that this was a hardware failure of the underlying machine?

jerome · February 19, 2023, 2:53pm

It wasn’t a maintenance but one of our hosts in CDG was down for ~15 minutes. Had to be rebooted. Might be related!

sdubinsky · February 19, 2023, 3:19pm

How did you manage to restart it? My pg app is stuck. It’s been several hours, too.

jerome · February 19, 2023, 3:23pm

We run bare-metal hosts, we’ve had to restart it via our provider’s console. We couldn’t reach the host any other way.

For your app, can you try restarting the affected instance?

sdubinsky · February 19, 2023, 3:36pm

Fly documentation is not great. Tried several times with fly pg restart and nothing, just kept saying “no active leader found”. Did some more googling, found fly machines restart and that did the trick. Thanks for your help!

jerome · February 19, 2023, 3:44pm

Ah yes. That pg command requires a working cluster (apparently). The fly machine commands don’t care.

We are working on a better pg solution that’s not so adversely affected by down nodes.

sdubinsky · February 22, 2023, 12:39pm

On the off chance you or someone reading this works for fly, the full sequence was:
fly restart -a fencing-database-db
fly restart fencing-database-db
fly apps restart fencing-database-db
fly pg restart -a fencing-database-db

I ran each of those in order. I found fly restart in the documentation somewhere, and each one in turn told me to run the next, instead of sending me directly to the last. Very very bad UX. And, of course, nowhere was fly machines restart mentioned.

brian · February 28, 2023, 12:11pm

For future reference, the fly machine restart command is documented here!

Topic		Replies	Views
Database cluster completely down	3	273	October 28, 2022
Database connectivity issues Questions / Help postgres	6	647	March 22, 2023
Issue with service	22	983	January 11, 2022
app is currently down for maintenance	14	563	June 22, 2023
Host Down on Newly Created App	8	73	September 19, 2024

Unannounced Maintenance?

Related topics