[Answered] Migrating dev preset to prod preset for postgres

I was wondering if it is possible to start with the development preset (single node) for postgres and then down the road without data lost migrate to the production preset (multi node HA) for postgres? I am ok with a potential short outage while servers are re-provisioning.

Hello! Yes, this is certainly the way to go. Just add another volume in the same region and scale up the vm count.

So it will auto adjust the settings to have a primary and a standby as well as doing failover in case one node fails?

Yes, I made a mistake though. You scale horizontally by cloning the leader machine and it will create a new machine with a new volume attached that will join the cluster as a replica.

Create a new PG single-node cluster:

$ fly pg create
? Choose an app name (leave blank to generate one): dangra-db4
? Select Organization: Daniel Graña (personal)
? Select region: Santiago, Chile (scl)
? Select configuration: Development - Single node, 1x shared CPU, 256MB RAM, 1GB disk
Creating postgres cluster in organization personal
Creating app...
Setting secrets on app dangra-db4...
Provisioning 1 of 1 machines with image flyio/postgres:14.4
Waiting for machine to start...
Machine 39080719f09387 is created
==> Monitoring health checks
  Waiting for 39080719f09387 to become healthy (started, 3/3)

Postgres cluster dangra-db4 created
  Username:    postgres
  Password:    rjSG6RhQhNxIpOy
  Hostname:    dangra-db4.internal
  Proxy port:  5432
  Postgres port:  5433
  Connection string: postgres://postgres:rjSG6RhQhNxIpOy@dangra-db4.internal:5432

Save your credentials in a secure place -- you won't be able to see them again!

Connect to postgres
Any app within the Daniel Graña organization can connect to this Postgres using the following connection string:

Now that you've set up postgres, here's what you need to understand: https://fly.io/docs/reference/postgres-whats-next/

Then check the status of the cluster, the single node must be the leader if everything went OK.

$ fly status -a dangra-db4
ID              STATE   ROLE    REGION  HEALTH CHECKS           IMAGE                           CREATED                 UPDATED
39080719f09387  started leader  scl     3 total, 3 passing      flyio/postgres:14.4 (v0.0.32)   2022-11-23T19:20:14Z    2022-11-23T19:20:34Z

Once you are sure, scale horizontally by adding a new (cloned) machine

$ fly machine clone -a dangra-db4 39080719f09387
Cloning machine 39080719f09387 into region scl
Provisioning a new machine with image flyio/postgres:14.4...
  Machine 21781762b23989 has been created...
  Waiting for machine 21781762b23989 to start...
  Waiting for 21781762b23989 to become healthy (started, 3/3)
Machine has been successfully cloned!

and to be sure it joined the cluster as replica:

$ fly status -a dangra-db4
ID              STATE   ROLE    REGION  HEALTH CHECKS           IMAGE                           CREATED                 UPDATED
21781762b23989  started replica scl     3 total, 3 passing      flyio/postgres:14.4 (v0.0.32)   2022-11-23T19:24:18Z    2022-11-23T19:24:38Z
39080719f09387  started leader  scl     3 total, 3 passing      flyio/postgres:14.4 (v0.0.32)   2022-11-23T19:20:14Z    2022-11-23T19:20:34Z

You can clone and add as many replicas as you want, even on different regions, but only the replicas in the same region than the leader will failover to in case the active leader goes down.

For completeness, you can force a failover by running

$ fly pg failover -a dangra-db4
Performing a failover
  Waiting for 21781762b23989 to become healthy (started, 3/3)
  Waiting for 39080719f09387 to become healthy (started, 3/3)
Failover complete

$ fly status -a dangra-db4
ID              STATE   ROLE    REGION  HEALTH CHECKS           IMAGE                           CREATED                 UPDATED
21781762b23989  started leader  scl     3 total, 3 passing      flyio/postgres:14.4 (v0.0.32)   2022-11-23T19:24:18Z    2022-11-23T19:24:38Z
39080719f09387  started replica scl     3 total, 3 passing      flyio/postgres:14.4 (v0.0.32)   2022-11-23T19:20:14Z    2022-11-23T19:20:34Z

And you can play with fly machines kill <MACHINE-ID> to simulate a leader crash and see it failover automatically.

Ah cool. Thank you so much for the info.

Very much related: How to do vertical scaling of a V2 machine-based Postgres cluster?

I can find nothing in the documentation that explains how to change what CPU/RAM an automatically created machine that is part of a postgres cluster runs on.

@qqwy Right now, you need to run fly machine update <machine-id> --cpus <num-cpus> --memory <memory_mb> per machine.

It’s not awesome, but it’s something we are actively working to improve.