I was wondering if it is possible to start with the development preset (single node) for postgres and then down the road without data lost migrate to the production preset (multi node HA) for postgres? I am ok with a potential short outage while servers are re-provisioning.
Hello! Yes, this is certainly the way to go. Just add another volume in the same region and scale up the vm count.
So it will auto adjust the settings to have a primary and a standby as well as doing failover in case one node fails?
Yes, I made a mistake though. You scale horizontally by cloning the leader machine and it will create a new machine with a new volume attached that will join the cluster as a replica.
Create a new PG single-node cluster:
$ fly pg create
? Choose an app name (leave blank to generate one): dangra-db4
? Select Organization: Daniel Graña (personal)
? Select region: Santiago, Chile (scl)
? Select configuration: Development - Single node, 1x shared CPU, 256MB RAM, 1GB disk
Creating postgres cluster in organization personal
Creating app...
Setting secrets on app dangra-db4...
Provisioning 1 of 1 machines with image flyio/postgres:14.4
Waiting for machine to start...
Machine 39080719f09387 is created
==> Monitoring health checks
Waiting for 39080719f09387 to become healthy (started, 3/3)
Postgres cluster dangra-db4 created
Username: postgres
Password: rjSG6RhQhNxIpOy
Hostname: dangra-db4.internal
Proxy port: 5432
Postgres port: 5433
Connection string: postgres://postgres:rjSG6RhQhNxIpOy@dangra-db4.internal:5432
Save your credentials in a secure place -- you won't be able to see them again!
Connect to postgres
Any app within the Daniel Graña organization can connect to this Postgres using the following connection string:
Now that you've set up postgres, here's what you need to understand: https://fly.io/docs/reference/postgres-whats-next/
Then check the status of the cluster, the single node must be the leader if everything went OK.
$ fly status -a dangra-db4
ID STATE ROLE REGION HEALTH CHECKS IMAGE CREATED UPDATED
39080719f09387 started leader scl 3 total, 3 passing flyio/postgres:14.4 (v0.0.32) 2022-11-23T19:20:14Z 2022-11-23T19:20:34Z
Once you are sure, scale horizontally by adding a new (cloned) machine
$ fly machine clone -a dangra-db4 39080719f09387
Cloning machine 39080719f09387 into region scl
Provisioning a new machine with image flyio/postgres:14.4...
Machine 21781762b23989 has been created...
Waiting for machine 21781762b23989 to start...
Waiting for 21781762b23989 to become healthy (started, 3/3)
Machine has been successfully cloned!
and to be sure it joined the cluster as replica:
$ fly status -a dangra-db4
ID STATE ROLE REGION HEALTH CHECKS IMAGE CREATED UPDATED
21781762b23989 started replica scl 3 total, 3 passing flyio/postgres:14.4 (v0.0.32) 2022-11-23T19:24:18Z 2022-11-23T19:24:38Z
39080719f09387 started leader scl 3 total, 3 passing flyio/postgres:14.4 (v0.0.32) 2022-11-23T19:20:14Z 2022-11-23T19:20:34Z
You can clone and add as many replicas as you want, even on different regions, but only the replicas in the same region than the leader will failover to in case the active leader goes down.
For completeness, you can force a failover by running
$ fly pg failover -a dangra-db4
Performing a failover
Waiting for 21781762b23989 to become healthy (started, 3/3)
Waiting for 39080719f09387 to become healthy (started, 3/3)
Failover complete
$ fly status -a dangra-db4
ID STATE ROLE REGION HEALTH CHECKS IMAGE CREATED UPDATED
21781762b23989 started leader scl 3 total, 3 passing flyio/postgres:14.4 (v0.0.32) 2022-11-23T19:24:18Z 2022-11-23T19:24:38Z
39080719f09387 started replica scl 3 total, 3 passing flyio/postgres:14.4 (v0.0.32) 2022-11-23T19:20:14Z 2022-11-23T19:20:34Z
And you can play with fly machines kill <MACHINE-ID>
to simulate a leader crash and see it failover automatically.
Ah cool. Thank you so much for the info.
Very much related: How to do vertical scaling of a V2 machine-based Postgres cluster?
I can find nothing in the documentation that explains how to change what CPU/RAM an automatically created machine that is part of a postgres cluster runs on.
@qqwy Right now, you need to run fly machine update <machine-id> --cpus <num-cpus> --memory <memory_mb>
per machine.
It’s not awesome, but it’s something we are actively working to improve.