Failed to migrate postgres to appsv2 due to FRA region restriction, postgres read-only and does not perform a failover

tofran · July 4, 2023, 3:56pm

I have been trying to migrate a Postgres cluster to the new apps v2, but the manual migrate-to-v2 did not work because I have volumes in the fra region and I’m not subscribed to any plan.
I could have paid the support for this organization, but expected I could learn a few things from migrating to a new region before moving to apps v2.

Unfortunately I wasn’t able to do so.
I first created a new volume in a new region mad waited for it to catch up and then produce a forced failover.
I tried to do this directly using stolonctl failkeeper and later via fly vm stop VM_ID but unfortunately the leader was always stuck to fra, and mad was never promoted.
After lots of trying I ended up breaking my cluster: instances are reporting as healthy but postgres is in read-only mode.

Instances
ID      	PROCESS	VERSION	REGION	DESIRED	STATUS           	HEALTH CHECKS     	RESTARTS	CREATED
6a935ca2	app    	27     	fra   	run    	running (leader) 	3 total, 3 passing	1       	45m23s ago
ab3ad2a4	app    	27     	mad   	run    	running (replica)	3 total, 3 passing	1       	4h5m ago

2023-07-04T15:10:20Z app[6a935ca2] fra [info]keeper   | 2023-07-04 15:10:20.043 UTC [379] ERROR:  cannot execute UPDATE in a read-only transaction
2023-07-04T15:10:20Z app[6a935ca2] fra [info]keeper   | 2023-07-04 15:10:20.043 UTC [379] STATEMENT:  update "my_table" set "sample" = $1 where "id" = $2

=== Active sentinels ===

ID		LEADER
212c4b66	true
49559e44	false
8e6fc6d8	false
ed019368	false

=== Active proxies ===

No active proxies

=== Keepers ===

UID		HEALTHY	PG LISTENADDRESS			PG HEALTHY	PG WANTEDGENERATION	PG CURRENTGENERATION
23c3110402	true	fdaa:0:6b42:a7b:23c3:1:1040:2:5433	true		11			11
25db9ede22	true	fdaa:0:6b42:a7b:25db:9:ede2:2:5433	true	2	2

=== Cluster Info ===

Master Keeper: 23c3110402

===== Keepers/DB tree =====

23c3110402 (master)
└─25db9ede22

How would you debug this problem? Why couldn’t I never perform a failover, even when there was no replication lag?

App internal id: ejpon17mppl1dgr4
Related topic: What is the correct process to change the postgres leader region? - #2 by shaun

ben-io · July 27, 2023, 7:09pm

Hey, I saw your cross-post here, are you still having the same issue around read-only instances? From my end, it looks like your app is on v2 already, but there are no VMs present.

tofran · July 27, 2023, 11:19pm

Thank you for the answer. I ended up creating a new database from a backup and left this app intentionally broken in order to get support from you. Unfortunately it somehow self-healed as I was able to perform an UPDATE just now. Note that it was broken for a long time and I only resorted creating a new one almost two days after I encountered the problem.

Do you perhaps have any idea why I was not able to perform a failover?
What would make the leader go into read-only mode?

Topic		Replies	Views
Migrate PostgreSQL to v2 postgres	1	249	August 9, 2023
Migrate appsv2 postgres to another region?	3	487	May 30, 2023
fly migrate-to-v2 - postgres edition Fresh Produce	51	2632	November 2, 2023
V2 fail, PG read-only... Questions / Help postgres	1	185	July 17, 2023
Migration to Apps v2 with volume in fra region not possible Questions / Help	2	379	May 18, 2023

Failed to migrate postgres to appsv2 due to FRA region restriction, postgres read-only and does not perform a failover

Related topics