I have been trying to migrate a Postgres cluster to the new apps v2, but the manual migrate-to-v2
did not work because I have volumes in the fra
region and I’m not subscribed to any plan.
I could have paid the support for this organization, but expected I could learn a few things from migrating to a new region before moving to apps v2.
Unfortunately I wasn’t able to do so.
I first created a new volume in a new region mad
waited for it to catch up and then produce a forced failover.
I tried to do this directly using stolonctl failkeeper
and later via fly vm stop VM_ID
but unfortunately the leader was always stuck to fra
, and mad
was never promoted.
After lots of trying I ended up breaking my cluster: instances are reporting as healthy but postgres is in read-only mode.
Instances
ID PROCESS VERSION REGION DESIRED STATUS HEALTH CHECKS RESTARTS CREATED
6a935ca2 app 27 fra run running (leader) 3 total, 3 passing 1 45m23s ago
ab3ad2a4 app 27 mad run running (replica) 3 total, 3 passing 1 4h5m ago
2023-07-04T15:10:20Z app[6a935ca2] fra [info]keeper | 2023-07-04 15:10:20.043 UTC [379] ERROR: cannot execute UPDATE in a read-only transaction
2023-07-04T15:10:20Z app[6a935ca2] fra [info]keeper | 2023-07-04 15:10:20.043 UTC [379] STATEMENT: update "my_table" set "sample" = $1 where "id" = $2
=== Active sentinels ===
ID LEADER
212c4b66 true
49559e44 false
8e6fc6d8 false
ed019368 false
=== Active proxies ===
No active proxies
=== Keepers ===
UID HEALTHY PG LISTENADDRESS PG HEALTHY PG WANTEDGENERATION PG CURRENTGENERATION
23c3110402 true fdaa:0:6b42:a7b:23c3:1:1040:2:5433 true 11 11
25db9ede22 true fdaa:0:6b42:a7b:25db:9:ede2:2:5433 true 2 2
=== Cluster Info ===
Master Keeper: 23c3110402
===== Keepers/DB tree =====
23c3110402 (master)
└─25db9ede22
How would you debug this problem? Why couldn’t I never perform a failover, even when there was no replication lag?
App internal id: ejpon17mppl1dgr4
Related topic: What is the correct process to change the postgres leader region? - #2 by shaun