I have been trying to migrate a Postgres cluster to the new apps v2, but the manual
migrate-to-v2 did not work because I have volumes in the
fra region and I’m not subscribed to any plan.
I could have paid the support for this organization, but expected I could learn a few things from migrating to a new region before moving to apps v2.
Unfortunately I wasn’t able to do so.
I first created a new volume in a new region
mad waited for it to catch up and then produce a forced failover.
I tried to do this directly using
stolonctl failkeeper and later via
fly vm stop VM_ID but unfortunately the leader was always stuck to
mad was never promoted.
After lots of trying I ended up breaking my cluster: instances are reporting as healthy but postgres is in read-only mode.
Instances ID PROCESS VERSION REGION DESIRED STATUS HEALTH CHECKS RESTARTS CREATED 6a935ca2 app 27 fra run running (leader) 3 total, 3 passing 1 45m23s ago ab3ad2a4 app 27 mad run running (replica) 3 total, 3 passing 1 4h5m ago
2023-07-04T15:10:20Z app[6a935ca2] fra [info]keeper | 2023-07-04 15:10:20.043 UTC  ERROR: cannot execute UPDATE in a read-only transaction 2023-07-04T15:10:20Z app[6a935ca2] fra [info]keeper | 2023-07-04 15:10:20.043 UTC  STATEMENT: update "my_table" set "sample" = $1 where "id" = $2
=== Active sentinels === ID LEADER 212c4b66 true 49559e44 false 8e6fc6d8 false ed019368 false === Active proxies === No active proxies === Keepers === UID HEALTHY PG LISTENADDRESS PG HEALTHY PG WANTEDGENERATION PG CURRENTGENERATION 23c3110402 true fdaa:0:6b42:a7b:23c3:1:1040:2:5433 true 11 11 25db9ede22 true fdaa:0:6b42:a7b:25db:9:ede2:2:5433 true 2 2 === Cluster Info === Master Keeper: 23c3110402 ===== Keepers/DB tree ===== 23c3110402 (master) └─25db9ede22
How would you debug this problem? Why couldn’t I never perform a failover, even when there was no replication lag?
App internal id:
Related topic: What is the correct process to change the postgres leader region? - #2 by shaun