pg-failover command not found

We’ve got a pg cluster on flyio/postgres-flex:15.2. Our leader crashed and we’re trying to failover to a new region following the instructions here: What is the correct process to change the postgres leader region? - #2 by shaun

We’ve updated the PRIMARY_REGION env variable and redeployed, but when we ssh into a host the pg-failover command does not exist. Any help would be appreciated!

Hi Elliot,

With Postgres 15.2 you should use the procedure described here; the one you linked to is for Stolon-based Fly Postgres which is not what you’re using (and no longer what we provision for new clusters):

  • Daniel

This is helpful. Any idea what to do in the scenario where a failover fails?

Performing a failover
Connecting to fdaa:0:22f0:a7b:106:38aa:7298:2... complete
Stopping current leader...  9080e693b0d9e8
Starting new leader
Promoting new leader...  e2865642ae6d78
Connecting to fdaa:0:22f0:a7b:106:38aa:7298:2... complete
NOTICE: promoting standby to primary
DETAIL: promoting server "fdaa:0:22f0:a7b:106:38aa:7298:2" (ID: 1100988338) using pg_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "fdaa:0:22f0:a7b:106:38aa:7298:2" (ID: 1100988338) was successfully promoted to primary
NOTICE: executing STANDBY FOLLOW on 7 of 7 siblings
Waiting 30 seconds for the old leader to stop...
INFO: STANDBY FOLLOW successfully executed on all reachable sibling nodes
Error promoting new leader, restarting existing leader
Waiting for old leader to finish stopping

Update: Despite the log message Error promoting new leader, restarting existing leader, the new leader took. The old leader never came back, I ended up having to force destroy the machine but all seems stable again. We are very much looking forward to a managed postgres option from Fly :sweat_smile:

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.