Multi-region postgres + elixir using fly_postgres and fly_rpc

After watching @brainlid’s presentation I got inspired to try out this multiregion setup on Fly (for those who didn’t see it: ElixirConf 2021 - Mark Ericksen - Globally Distributed Elixir Apps on Fly.io - YouTube). However, I’m not able to set it up properly by just following the guides / readme’s - it keeps on failing to connect to the database. This is what I did:

I used the getting started guide, added libcluster and got the app running on in two regions:

This all seems to work fine. It also had setup a database cluster in a single region with that. After that I started with changing to code to support this multiregion db setup using the fly_postgres readme and implemented every step. Than I looked at the scaling the postgres cluster to multiple regions with this guide. Once the app was deployed with new code and fly_postgres I added a new volume (so that I have 1 in ams and 1 in sea) and scaled the instances of the db cluster to multiple regions (but with max 1 for every region). This works and the PRIMARY_REGION is set to ams for both the app cluster as the db cluster.

However, the sea app cluster can’t connect to the database:

Did anyone created a similar setup like this successfully? I’m not sure what I did wrong or what I can do to get it to work.

Does fly status -a <db cluster app name> say that sea is a replica region? The “status” column should look something like this:

Instances
ID      	PROCESS	VERSION	REGION	DESIRED	STATUS           	HEALTH CHECKS     	RESTARTS	CREATED              
6dad6fd9	app    	2      	ord   	run    	running (replica)	3 total, 3 passing	0       	2022-01-04T09:55:39Z	
016d6f8f	app    	2      	fra   	run    	running (leader) 	3 total, 3 passing	0       	2022-01-04T09:53:17Z	
8b93c566	app    	2      	fra   	run    	running (replica)	3 total, 3 passing	0       	2022-01-04T09:53:17Z	

I’m getting the following, without leader/replica in its status…

Instances
ID      	PROCESS	VERSION	REGION	DESIRED	STATUS 	HEALTH CHECKS     	RESTARTS	CREATED              
ec7c8d09	app    	4      	sea   	run    	running	2 total, 2 passing	0       	14h44m ago          	
9a05dd5b	app    	4      	ams   	run    	running	2 total, 2 passing	0       	2022-01-06T10:23:38Z	

Not sure how I can define which one is leader/replica?

I also have no idea if there’s a way to transform such a deployment into a cluster - ran into a similar issue recently and my workaround was to create a new postgres app, clustered from the get-go.
When I originally ran fly launch for my Phoenix project, there was no option to choose which kind of postgres deployment you want (clustered or not), so I also initially had a single-instance postgres. So when I added a new region to it and configured my app with fly_postgres_elixir lib, I got the exact same error - the app instance in my replica region attempted to connect to a postgres instance in the right region, but that instance listened on 5432, not 5433 (which is the port fly_postgres_elixir tries to connect to in replica regions, and which postgres replica instances listen on in a clustered deployment).
So I ran fly pg create --name <new pg cluster app name> --region <my primary region> and chose a production configuration, which created an app with two instances in the primary region: one leader, one replica; then added a new region and scaled the pg cluster app to 3 instances (2 replicas, 1 leader).
Then I removed DATABASE_URL secret from my app* and ran fly pg attach --postgres-app <new pg cluster app name> to attach this new pg cluster to the app (and later simply deleted the old postgres app w/o any data migrations because nothing of value was lost in my case, but YMMV), and after redeployment it worked - app instances in both regions connected successfully.

* actually, I tried fly pg detach first, but it wouldn’t let me, probably because I had some data in the db, didn’t investigate any further.

1 Like

Woa yes! You keep on helping me out @kkonstant :raised_hands:

You’re right that you need the production instance to accomplish this. I will try to run some tests to see if everything is correctly hooked up to each other.

Yes @kkonstant, a recent change within Fly was to create a non-clustered PG database so it could be a “free” database. That works well for traditional single-server Phoenix apps, but when you go multi-region and need the replicas… yeah.

Looks like we can certainly do something to improve the visibility, guides or choices here. This is why the guide didn’t work for @noegip going multi-region. It was written before the Free PG DB config was introduced! :slight_smile:

Thanks!

As of right now, the easiest way to transition from a standalone Postgres to a clustered Postgres is to just provision a new HA Postgres app from a snapshot.

# List volumes of Standalone PG
fly volumes list --app <postgres-app>

# List the snapshots associated with the Volume
fly volumes snapshots list vol_xxx

# Provision a new PG  while specifying the snapshot obtained from your standalone PG. 
fly pg create --snapshot-id <standalone-pg-snapshot-id>

The Standalone Postgres configuration just runs vanilla PG and does not come configured with Stolon. I 100% agree we need better documentation conveying the differences and how to transition from Standalone → Clustered.

3 Likes

Just a quick update on this. We did tweak this a bit and new provisions of the Standalone PG will now come configured with Stolon. This will allow you to horizontally scale up in the event you need HA.

1 Like