Fly Postgres with 2 (two) Nodes?

I am experimenting with Fly Postgres, and I’m confused on one point. The cli and the docs explain that a minimum of three nodes are required for HA. So what happens in a failure scenario with a two node setup?

When I run fly postgres db list I see my two nodes:

NAME            USERS                        
postgres        flypgadmin, postgres, repmgr
repmgr          flypgadmin, postgres, repmgr

However, fly postgres failover does not work:

Error promoting new leader, restarting existing leader
Waiting for old leader to finish stopping
Clearing existing machine lease...
Trying to start old leader
Old leader started succesfully
Error: Failed to run failover: Not enough machines to meet quorum requirements

So what do you do if the primary fails? There is clearly a working replica in the cluster. I assume without the HA setup you don’t get automatic failovers. But what manual steps would I need to take to promote the replica?

There’s no centralized consensus store, so in a two node setup there’s no way to verify you’re not in a network partition.

If the primary fails and you’re running a two node setup, you’ll need to ssh into your standby Machine and manually promote the standby.

SSH into the Machine

fly ssh console <machine-id>

Navigate to the postgres home directory

su postgres
cd ~

Evaluate the state of the cluster

postgres@2865117f1ed018:~$ repmgr daemon status
 ID | Name                            | Role    | Status        | Upstream                          | repmgrd | PID | Paused? | Upstream last seen
----+---------------------------------+---------+---------------+-----------------------------------+---------+-----+---------+--------------------
 1502360255 | fdaa:0:2e26:a7b:110:e0ed:7af0:2 | standby |   running     | ? fdaa:0:2e26:a7b:196:f8c4:fb70:2 | running | 399 | no      | 428 second(s) ago
 1977791143 | fdaa:0:2e26:a7b:196:f8c4:fb70:2 | primary | ? unreachable | ?                                 | n/a     | n/a | n/a     | n/a

Promote the standby:

repmgr standby promote

Unregister the old primary.

repmgr primary unregister --node-id 1977791143

Then you should remove your old Primary machine and re-clone from your newly assigned primary.

2 Likes

Fantastic write up, @shaun. I was able to perform this operation on my own and everything worked exactly as described. Running two nodes is perfect for my use case for a low-medium traffic app where HA is not (yet) a strong need, but I obviously want redundancy. I don’t mind getting pinged by monitoring and having to perform the promotion manually. Thanks for your help!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.