Scale machines postgres cluster down to 1

I was experimenting with horizontally scaling my machines based postgres instance from a single machine to a primary/replica arrangement.

I ran

fly machine clone

as documented on Horizontal Scaling · Fly Docs and it worked great! But my app isn’t fully production ready and doesn’t need a replica sitting around right now.

What’s the procedure to scale back down to a single instance? I tried starting cordon/uncordon on the replica but that just put the primary into a zombie state. No good!

There are several ways to do this, but the easiest is just to run flyctl machines destroy on the machines you don’t need. The destroy command will handle removing the machines from the cluster before they’re destroyed.

Incidentally, for all reading, not just Brett, he says he’s just using PG for dev or testing and not for production workloads, which is a fine reason to use a single-node PG. But many users use single-machine Fly PG for production! Please don’t do this – it’s not a supported configuration! I’ve had to break the news to several users recently that their data was lost/may be lost when a host goes offline for a (suspected) hardware failure, taking the data with it. Always run a Fly PG cluster with 3 or more nodes in production.

1 Like

The snapshot replicas are safe across hardware failure though right?

Ok I just tried this, and similar to cordon, it turns the primary into a zombie that stops accepting connections.

Cloning a new replica brings everything back to normal.

Would snapshotting and creating a new single node db app work better maybe?

Ah, I’m sorry, I wasn’t paying close attention to what I was saying.

fly m destroy will work fine when destroying Machines 3 → 2, or 4 → 3, or any higher cluster number, because the loss of a single node will not bring the cluster down, so when the node stops the cluster is fine, and when the node is destroyed then the max cluster size is reconfigured.

But for 2 → 1 it’s different, because a cluster of 2 has a quorum of 2, so when the replica goes offline the whole cluster falls apart. Since the cluster isn’t in good health, the commands fly m destroy issues to shrink the cluster size aren’t received.

In this case, yes, you should create a new cluster with --initial-cluster-size 1. If you want to copy over the testing data from your previous cluster (again, strongly do NOT recommend using a cluster size of 1 for production DBs), you can add the flag--fork-from <OTHER_CLUSTER_APP_NAME>:<VOL_ID>. You can then destroy the old cluster with fly apps destroy.


This should work fine if you perform a fly machines remove <machine-id> -f. If you stop the Machine first, things will break given the reasons mentioned above. If you do happen to stop the Machine first, you would need to SSH into the Machine and use repmgr to manually unregister the Machine.

These docs should be helpful: postgres-flex/docs/ at master · fly-apps/postgres-flex · GitHub

1 Like

I indeed stopped the machine first rather than utilize the --force flag. Thanks for the insights!

EDIT: fly machines remove -f scales everything back down correctly without re-creating a new pg service.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.