[solved] Cannot connect to replicas in multi-region postgres

Based on the tutorial https://fly.io/docs/getting-started/multi-region-databases/#add-read-replicas, I do:

> fly pg create --name niss-db5 --region dfw
...
--> v0 deployed successfully

Connect to postgres
Any app within the personal organization can connect to postgres using the above credentials and the hostname "niss-db5.internal."
For example: postgres://postgres:[REDACTED]@niss-db5.internal:5432

I can now connect to the proxy port, but not the direct port.

> psql postgres://postgres:[REDACTED]@niss-db5.internal:5432
psql (14.1 (Ubuntu 14.1-1.pgdg21.04+1))

> psql postgres://postgres:[REDACTED]@niss-db5.internal:5433
psql: error: connection to server at "niss-db5.internal" (fdaa:0:3b76:a7b:12de:0:6a58:2), port 5433 failed: Connection refused

Now I try and create a replica

> fly volumes create pg_data -a niss-db5 --size 1 --region lhr
> fly scale count 2 -a niss-db5
> fly status -a niss-db5
App
  Name     = niss-db5          
  Owner    = personal          
  Version  = 2                 
  Status   = running           
  Hostname = niss-db5.fly.dev  

Instances
ID       PROCESS VERSION REGION DESIRED STATUS  HEALTH CHECKS      RESTARTS CREATED   
18f0148b app     2       lhr    run     running 2 total            0        5s ago    
344966d3 app     2       dfw    run     running 2 total, 2 passing 0        2m44s ago 

I can connect to the primary through the proxy port on the replica, but not to the replica directly.

> psql postgres://postgres:[REDACTED]@lhr.niss-db5.internal:5432
psql (14.1 (Ubuntu 14.1-1.pgdg21.04+1))

> psql postgres://postgres:[REDACTED]lhr.@niss-db5.internal:5433
psql: error: connection to server at "niss-db5.internal" (fdaa:0:3b76:a7b:12de:0:6a58:2), port 5433 failed: Connection refused

I’m not sure if this is relevant, but fly image show indicates flyio/postgres-standalone:14, not flyio/postgres:14 as I’d expect.

I tried creating a cluster with a “production” configuration, and got the flyio/postgres:14 image. At that point clustering appeared to work, and the status showed (replica) and (leader)

Instances
ID       PROCESS VERSION REGION DESIRED STATUS            HEALTH CHECKS      RESTARTS CREATED   
da49aa26 app     0       dfw    run     running (replica) 3 total, 3 passing 0        2m24s ago 
e32496d5 app     0       dfw    run     running (leader)  3 total, 3 passing 0        2m24s ago 

I think it’s not intuitive that if you select a development config and then scale it up you get multiple primaries, and this should probably be a doc fix.

@danielzfranklin That’s good feedback and we will certainly work to provide more clarity surrounding the differences between our “Development” and “Production” configurations. As you have already discovered, the “Development” configuration runs a standard single-node Postgres instance without HA capabilities.

Horizontally scaling a single node “Development” PG instance isn’t going to be particularly useful in most cases, so it may make sense for us to disable that feature for this particular image.