We’ve updated the PRIMARY_REGION env variable and redeployed, but when we ssh into a host the pg-failover command does not exist. Any help would be appreciated!
With Postgres 15.2 you should use the procedure described here; the one you linked to is for Stolon-based Fly Postgres which is not what you’re using (and no longer what we provision for new clusters):
This is helpful. Any idea what to do in the scenario where a failover fails?
Performing a failover
Connecting to fdaa:0:22f0:a7b:106:38aa:7298:2... complete
Stopping current leader... 9080e693b0d9e8
Starting new leader
Promoting new leader... e2865642ae6d78
Connecting to fdaa:0:22f0:a7b:106:38aa:7298:2... complete
NOTICE: promoting standby to primary
DETAIL: promoting server "fdaa:0:22f0:a7b:106:38aa:7298:2" (ID: 1100988338) using pg_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "fdaa:0:22f0:a7b:106:38aa:7298:2" (ID: 1100988338) was successfully promoted to primary
NOTICE: executing STANDBY FOLLOW on 7 of 7 siblings
Waiting 30 seconds for the old leader to stop...
INFO: STANDBY FOLLOW successfully executed on all reachable sibling nodes
Error promoting new leader, restarting existing leader
Waiting for old leader to finish stopping
Update: Despite the log message Error promoting new leader, restarting existing leader, the new leader took. The old leader never came back, I ended up having to force destroy the machine but all seems stable again. We are very much looking forward to a managed postgres option from Fly