What is the correct process to change the postgres leader region?

I wanted to move my postgres leader region from SYD to LAX.
I created a replica in LAX, but couldn’t find a way to trigger the failover.

I tried restarting the SYD postgres VM but it just restarted as leader.
I tried deleting the SYD pg_data volume, which remove the leader, but now
the replica seems stuck as a replica with no leader:

2022-04-22T05:55:59Z app[ba4a2446] lax [info]sentinel | 2022-04-22T05:55:59.401Z	INFO	cmd/sentinel.go:1006	trying to find a new master to replace failed master
2022-04-22T05:55:59Z app[ba4a2446] lax [info]sentinel | 2022-04-22T05:55:59.401Z	INFO	cmd/sentinel.go:741	ignoring keeper since it cannot be master (--can-be-master=false)	{"db": "4bfd05f3", "keeper": "7d180d2f82"}
2022-04-22T05:55:59Z app[ba4a2446] lax [info]sentinel | 2022-04-22T05:55:59.401Z	ERROR	cmd/sentinel.go:1009	no eligible masters
2022-04-22T05:56:01Z app[ba4a2446] lax [info]keeper   | 2022-04-22T05:56:01.738Z	INFO	cmd/keeper.go:1556	our db requested role is standby	{"followedDB": "ebc4d408"}
2022-04-22T05:56:06Z app[ba4a2446] lax [info]sentinel | 2022-04-22T05:56:06.146Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "ebc4d408", "keeper": "2983094dd2"}
2022-04-22T05:56:06Z app[ba4a2446] lax [info]sentinel | 2022-04-22T05:56:06.150Z	INFO	cmd/sentinel.go:995	master db is failed	{"db": "ebc4d408", "keeper": "2983094dd2"}
2022-04-22T05:56:06Z app[ba4a2446] lax [info]sentinel | 2022-04-22T05:56:06.151Z	INFO	cmd/sentinel.go:1006	trying to find a new master to replace failed master
2022-04-22T05:56:06Z app[ba4a2446] lax [info]sentinel | 2022-04-22T05:56:06.151Z	INFO	cmd/sentinel.go:741	ignoring keeper since it cannot be master (--can-be-master=false)	{"db": "4bfd05f3", "keeper": "7d180d2f82"}

So I have two questions:

  1. What is the recommended process to change the postgres leader region?
  2. Why didn’t failover work in this instance? (I expected the LAX replica would be promoted to leader)
1 Like

Hey there,

So it’s a bit of a process, but i’ll walk you through it.

Step 1: Adjust your apps primary region.

Pull down your fly.toml file if you haven’t already.

fly config save --app <app-name>

Modify the PRIMARY_REGION value inside of your fly.toml file

[env]
 PRIMARY_REGION = "lax"

Deploy your app.

WARNING: Your app will not accept writes until your issue the failover in Step 2. The HAProxy routes connections to the primary and leverages the PRIMARY REGION env var. This also requires an immediate deploy, which means it will deploy the new image to all members at the same time. This will result in a brief period of downtime. I would recommend testing this process on a staging environment if this app is critical.

# Run this inside the same directory as your fly.toml
fly deploy . --image flyio/postgres:<major-version> --strategy=immediate

If you don’t the image you’re running, you can view it by running:
fly image show

So for example, if you’re Tag indicates 14.2, the image reference should look like flyio/postgres:14.

Step 2: Orchestrating a failover

Verify your version:

fly image show

Registry   = registry-1.docker.io
Repository = flyio/postgres
Tag        = 14.2
Version    = v0.0.21
Digest     = sha256:4e4a7bfef439b5e02fa3803c4b8225b57c297fa114f995855d5d7807828d9008

If you’re running PG13/14 with Version v0.0.13+

fly ssh console --app <app-name>

pg-failover

If you’re running PG12 or an earlier Version:

fly ssh console --app <app-name>

bash
# Export Stolon specific env vars.
export $(cat /data/.env | xargs) 

# Identify the master keeper id
stolonctl status  

# Fail the master keeper to trigger the failover.
stolonctl failkeeper <master-keeper-id>

# Verify the state of the world.
stolonctl status  # Verify that the master has indeed changed.

Let me know if you have any questions on anything.

2 Likes

Exactly what I needed. Thank you.

1 Like