I have a postgres cluster that I’m trying to move from IAD region to EWR.
I’ve followed the High Availability & Global Replication · Fly Docs guide without success.
Mainly, the issue happens when I try to issue the fly pg failover
command, it will fail like this:
$ fly pg failover -a fbc-db-dev ~/p/feedback_cupcake
Performing a failover
Connecting to fdaa:9:ba67:a7b:b:1978:5ed0:2... complete
Connecting to fdaa:9:ba67:a7b:333:b451:e1e:2... complete
Connecting to fdaa:9:ba67:a7b:e:7efc:266e:2...⣾Error promoting new leader, restarting existing leader
Waiting for old leader to finish stopping
Connecting to fdaa:9:ba67:a7b:e:7efc:266e:2... complete
Clearing existing machine lease...
Trying to start old leader
Old leader started succesfully
Error: Failed to run failover: no leader could be chosen. Here are the reasons why:
1852d36f124e08: Running a dry run of `repmgr standby switchover` failed. Try running `fly ssh console -u postgres -C 'repmgr standby switchover -f /data/repmgr.conf --dry-run' -s -a fbc-db-dev` for more information. This was most likely due to the requirements for quorum not being met.
e784391dcd9658: Running a dry run of `repmgr standby switchover` failed. Try running `fly ssh console -u postgres -C 'repmgr standby switchover -f /data/repmgr.conf --dry-run' -s -a fbc-db-dev` for more information. This was most likely due to the requirements for quorum not being met.
781620db526e28: Running a dry run of `repmgr standby switchover` failed. Try running `fly ssh console -u postgres -C 'repmgr standby switchover -f /data/repmgr.conf --dry-run' -s -a fbc-db-dev` for more information. This was most likely due to the requirements for quorum not being met.
please fix one or more of the above issues, and try again
Investigating a little more, I found that the issue is because the repmgr
command will fail because it can’t connect via SSH to the primary node. If I ssh into a replica, change to the postgres
user and run the repmgr
command, I get this:
$ repmgr standby switchover -f /data/repmgr.conf --dry-run
NOTICE: checking switchover on node "fdaa:9:ba67:a7b:2fa:7661:f3e1:2" (ID: 1656691159) in --dry-run mode
WARNING: unable to connect to remote host "fdaa:9:ba67:a7b:302:13ac:2e5d:2" via SSH
ERROR: unable to connect via SSH to host "fdaa:9:ba67:a7b:302:13ac:2e5d:2", user ""
To confirm the issue, I also tried to connect to the node directly from ssh:
$ ssh -vvv fdaa:9:ba67:a7b:332:f2e:fdb:2
OpenSSH_9.6p1 Ubuntu-3ubuntu13.5, OpenSSL 3.0.13 30 Jan 2024
debug1: Reading configuration data /data/.ssh/config
debug2: checking match for 'exec "nslookup '%h.vm.fbc-db-dev.internal' | awk '/^Address: / { print $2 }' | grep ."' host fdaa:9:ba67:a7b:332:f2e:fdb:2 originally fdaa:9:ba67:a7b:332:f2e:fdb:2
debug1: Executing command: 'nslookup 'fdaa:9:ba67:a7b:332:f2e:fdb:2.vm.fbc-db-dev.internal' | awk '/^Address: / { print $2 }' | grep .'
debug3: command returned status 1
debug3: /data/.ssh/config line 1: not matched 'exec "nslookup 'fdaa:9:ba67:a7b:332:f2e:fdb:2.vm.fbc-db-dev.internal' | awk '/^Address: / { print $2 }' | "'
debug2: match not found
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: include /etc/ssh/ssh_config.d/*.conf matched no files
debug1: /etc/ssh/ssh_config line 21: Applying options for *
debug2: resolve_canonicalize: hostname fdaa:9:ba67:a7b:332:f2e:fdb:2 is address
debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts' -> '/data/.ssh/known_hosts'
debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts2' -> '/data/.ssh/known_hosts2'
debug3: channel_clear_timeouts: clearing
debug3: ssh_connect_direct: entering
debug1: Connecting to fdaa:9:ba67:a7b:332:f2e:fdb:2 [fdaa:9:ba67:a7b:332:f2e:fdb:2] port 22.
debug3: set_sock_tos: set socket 3 IPV6_TCLASS 0x10
debug1: Connection established.
debug1: identity file /data/.ssh/id_rsa type 3
debug1: identity file /data/.ssh/id_rsa-cert type 7
debug1: identity file /data/.ssh/id_ecdsa type -1
debug1: identity file /data/.ssh/id_ecdsa-cert type -1
debug1: identity file /data/.ssh/id_ecdsa_sk type -1
debug1: identity file /data/.ssh/id_ecdsa_sk-cert type -1
debug1: identity file /data/.ssh/id_ed25519 type -1
debug1: identity file /data/.ssh/id_ed25519-cert type -1
debug1: identity file /data/.ssh/id_ed25519_sk type -1
debug1: identity file /data/.ssh/id_ed25519_sk-cert type -1
debug1: identity file /data/.ssh/id_xmss type -1
debug1: identity file /data/.ssh/id_xmss-cert type -1
debug1: identity file /data/.ssh/id_dsa type -1
debug1: identity file /data/.ssh/id_dsa-cert type -1
debug1: Local version string SSH-2.0-OpenSSH_9.6p1 Ubuntu-3ubuntu13.5
debug1: Remote protocol version 2.0, remote software version Hallpass_1.0.0.17812
debug1: compat_banner: no match: Hallpass_1.0.0.17812
debug2: fd 3 setting O_NONBLOCK
debug1: Authenticating to fdaa:9:ba67:a7b:332:f2e:fdb:2:22 as 'postgres'
debug3: record_hostkey: found key type ED25519 in file /data/.ssh/known_hosts:1
debug3: load_hostkeys_file: loaded 1 keys from fdaa:9:ba67:a7b:332:f2e:fdb:2
debug1: load_hostkeys: fopen /data/.ssh/known_hosts2: No such file or directory
debug1: load_hostkeys: fopen /etc/ssh/ssh_known_hosts: No such file or directory
debug1: load_hostkeys: fopen /etc/ssh/ssh_known_hosts2: No such file or directory
debug3: order_hostkeyalgs: have matching best-preference key type ssh-ed25519-cert-v01@openssh.com, using HostkeyAlgorithms verbatim
debug3: send packet: type 20
debug1: SSH2_MSG_KEXINIT sent
debug3: receive packet: type 20
debug1: SSH2_MSG_KEXINIT received
debug2: local client KEXINIT proposal
debug2: KEX algorithms: sntrup761x25519-sha512@openssh.com,curve25519-sha256,curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group14-sha256,ext-info-c,kex-strict-c-v00@openssh.com
debug2: host key algorithms: ssh-ed25519-cert-v01@openssh.com,ecdsa-sha2-nistp256-cert-v01@openssh.com,ecdsa-sha2-nistp384-cert-v01@openssh.com,ecdsa-sha2-nistp521-cert-v01@openssh.com,sk-ssh-ed25519-cert-v01@openssh.com,sk-ecdsa-sha2-nistp256-cert-v01@openssh.com,rsa-sha2-512-cert-v01@openssh.com,rsa-sha2-256-cert-v01@openssh.com,ssh-ed25519,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,sk-ssh-ed25519@openssh.com,sk-ecdsa-sha2-nistp256@openssh.com,rsa-sha2-512,rsa-sha2-256
debug2: ciphers ctos: chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com
debug2: ciphers stoc: chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com
debug2: MACs ctos: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1
debug2: MACs stoc: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1
debug2: compression ctos: none,zlib@openssh.com,zlib
debug2: compression stoc: none,zlib@openssh.com,zlib
debug2: languages ctos:
debug2: languages stoc:
debug2: first_kex_follows 0
debug2: reserved 0
debug2: peer server KEXINIT proposal
debug2: KEX algorithms: curve25519-sha256,curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group14-sha256,diffie-hellman-group14-sha1,kex-strict-s-v00@openssh.com
debug2: host key algorithms: ssh-ed25519
debug2: ciphers ctos: aes128-gcm@openssh.com,aes256-gcm@openssh.com,chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr
debug2: ciphers stoc: aes128-gcm@openssh.com,aes256-gcm@openssh.com,chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr
debug2: MACs ctos: hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1,hmac-sha1-96
debug2: MACs stoc: hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1,hmac-sha1-96
debug2: compression ctos: none
debug2: compression stoc: none
debug2: languages ctos:
debug2: languages stoc:
debug2: first_kex_follows 0
debug2: reserved 0
debug3: kex_choose_conf: will use strict KEX ordering
debug1: kex: algorithm: curve25519-sha256
debug1: kex: host key algorithm: ssh-ed25519
debug1: kex: server->client cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: kex: client->server cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug3: send packet: type 30
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
debug3: receive packet: type 31
debug1: SSH2_MSG_KEX_ECDH_REPLY received
debug1: Server host key: ssh-ed25519 SHA256:vZODJBdfDolgQN2Hi0G/YpGjsv5ajvXlQHG5wvVAP8c
debug3: record_hostkey: found key type ED25519 in file /data/.ssh/known_hosts:1
debug3: load_hostkeys_file: loaded 1 keys from fdaa:9:ba67:a7b:332:f2e:fdb:2
debug1: load_hostkeys: fopen /data/.ssh/known_hosts2: No such file or directory
debug1: load_hostkeys: fopen /etc/ssh/ssh_known_hosts: No such file or directory
debug1: load_hostkeys: fopen /etc/ssh/ssh_known_hosts2: No such file or directory
debug1: Host 'fdaa:9:ba67:a7b:332:f2e:fdb:2' is known and matches the ED25519 host key.
debug1: Found key in /data/.ssh/known_hosts:1
debug3: send packet: type 21
debug1: ssh_packet_send2_wrapped: resetting send seqnr 3
debug2: ssh_set_newkeys: mode 1
debug1: rekey out after 134217728 blocks
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug3: receive packet: type 21
debug1: ssh_packet_read_poll2: resetting read seqnr 3
debug1: SSH2_MSG_NEWKEYS received
debug2: ssh_set_newkeys: mode 0
debug1: rekey in after 134217728 blocks
debug3: send packet: type 5
debug3: receive packet: type 7
debug1: SSH2_MSG_EXT_INFO received
debug3: kex_input_ext_info: extension server-sig-algs
debug1: kex_ext_info_client_parse: server-sig-algs=<ssh-ed25519,sk-ssh-ed25519@openssh.com,sk-ecdsa-sha2-nistp256@openssh.com,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,rsa-sha2-256,rsa-sha2-512,ssh-rsa,ssh-dss>
debug3: kex_input_ext_info: extension ping@openssh.com
debug1: kex_ext_info_check_ver: ping@openssh.com=<0>
debug3: receive packet: type 6
debug2: service_accept: ssh-userauth
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug3: send packet: type 50
debug3: receive packet: type 51
debug1: Authentications that can continue: publickey
debug3: start over, passed a different list publickey
debug3: preferred gssapi-with-mic,publickey,keyboard-interactive,password
debug3: authmethod_lookup publickey
debug3: remaining preferred: keyboard-interactive,password
debug3: authmethod_is_enabled publickey
debug1: Next authentication method: publickey
debug1: Will attempt key: /data/.ssh/id_rsa ED25519 SHA256:AQ1z4nAlyiiEk2Qho4uGDyhziv+648xMIwvtZwKra2o
debug1: Will attempt key: /data/.ssh/id_rsa ED25519-CERT SHA256:AQ1z4nAlyiiEk2Qho4uGDyhziv+648xMIwvtZwKra2o
debug1: Will attempt key: /data/.ssh/id_ecdsa
debug1: Will attempt key: /data/.ssh/id_ecdsa_sk
debug1: Will attempt key: /data/.ssh/id_ed25519
debug1: Will attempt key: /data/.ssh/id_ed25519_sk
debug1: Will attempt key: /data/.ssh/id_xmss
debug1: Will attempt key: /data/.ssh/id_dsa
debug2: pubkey_prepare: done
debug1: Offering public key: /data/.ssh/id_rsa ED25519 SHA256:AQ1z4nAlyiiEk2Qho4uGDyhziv+648xMIwvtZwKra2o
debug3: send packet: type 50
debug2: we sent a publickey packet, wait for reply
debug3: receive packet: type 51
debug1: Authentications that can continue: publickey
debug1: Offering public key: /data/.ssh/id_rsa ED25519-CERT SHA256:AQ1z4nAlyiiEk2Qho4uGDyhziv+648xMIwvtZwKra2o
debug3: send packet: type 50
debug2: we sent a publickey packet, wait for reply
debug3: receive packet: type 51
debug1: Authentications that can continue: publickey
debug1: Trying private key: /data/.ssh/id_ecdsa
debug3: no such identity: /data/.ssh/id_ecdsa: No such file or directory
debug1: Trying private key: /data/.ssh/id_ecdsa_sk
debug3: no such identity: /data/.ssh/id_ecdsa_sk: No such file or directory
debug1: Trying private key: /data/.ssh/id_ed25519
debug3: no such identity: /data/.ssh/id_ed25519: No such file or directory
debug1: Trying private key: /data/.ssh/id_ed25519_sk
debug3: no such identity: /data/.ssh/id_ed25519_sk: No such file or directory
debug1: Trying private key: /data/.ssh/id_xmss
debug3: no such identity: /data/.ssh/id_xmss: No such file or directory
debug1: Trying private key: /data/.ssh/id_dsa
debug3: no such identity: /data/.ssh/id_dsa: No such file or directory
debug2: we did not send a packet, disable method
debug1: No more authentication methods to try.
postgres@fdaa:9:ba67:a7b:332:f2e:fdb:2: Permission denied (publickey).
As you can see, I get a publickey
error because there is no public key registered at all.
How should I fix this? I think I could just generate the ssh keys and register them in the primary node, but honestly this feels like a “ugly” workaround since I don’t get why this is not already setup by fly anyway.
PS: I also saw some other posts with the same issue but without any resolution, I kindly ask someone from Fly to look into this since it is creating real issues to me and blocking me to do some architectural changes in my system because I can’t change the DB region.