Unhealthy machine primary

Jorgebl96 · October 22, 2025, 1:22pm

Hi all, I have a problem since this morning. I have my machine with my postgres db with primary machine and now is on read only mode since this morning and my client calls me. I didn´t do any action in db since 2 months. And I saw the primary is inestable and replica is showing the data but you can’t modify, create or delete in replica, it´s obvious, is on read only.

I was seeing the primary machine has all time 2/3 health and replica has 3/3. The region where are located the machines are cds (Paris, France) I couldn`t do any action.

I try to put primary my replica but it´s impossible. I increment the resources and wait about 1 hour. Nothing, and my client is losing time and money…

Can someone help me please?

mayailurus · October 22, 2025, 2:03pm

Hi… Sorry to hear you’re having trouble with this, … Most of us here in the community forum cannot poke around in your app settings, logs, etc.; we can only go by what you yourself post in the thread.

It would help to know the output of the following commands, for example: fly status -a db-app-name, fly m list -a db-app-name, fly checks list -a db-app-name.

(It sounds in particular like you might have one of the “doubly deprecated” Stolon-based clusters, since the PG Flex ones should have 3 or more Machines in the cluster.)

dangra · October 22, 2025, 2:15pm

Hello. Your case called my attention, I looked at your database and I think the primary is operational now.

If you have customers and a business my recommendation is to switch to Managed Postgres.

That said, this is what I did to recover yours.

root@683d527add7458:/# su - postgres

postgres@683d527add7458:~$ repmgr cluster show
WARNING: node "fdaa:1:c9f6:a7b:1bf:47dd:bf6e:2" not found in "pg_stat_replication"
 ID         | Name                            | Role    | Status               | Upstream                          | Location | Priority | Timeline | Connection string
------------+---------------------------------+---------+----------------------+-----------------------------------+----------+----------+----------+--------------------------------------------------------------------------------------------
 520037516  | fdaa:1:c9f6:a7b:1bf:47dd:bf6e:2 | standby |   running            | ! fdaa:1:c9f6:a7b:1be:f615:226a:2 | cdg      | 100      | 1        | host=fdaa:1:c9f6:a7b:1bf:47dd:bf6e:2 port=5433 user=repmgr dbname=repmgr connect_timeout=5
 1217570335 | fdaa:1:c9f6:a7b:1be:f615:226a:2 | primary | ! running as standby |                                   | mad      | 100      | 1        | host=fdaa:1:c9f6:a7b:1be:f615:226a:2 port=5433 user=repmgr dbname=repmgr connect_timeout=5

WARNING: following issues were detected
  - node "fdaa:1:c9f6:a7b:1bf:47dd:bf6e:2" (ID: 520037516) is not attached to its upstream node "fdaa:1:c9f6:a7b:1be:f615:226a:2" (ID: 1217570335)
  - node "fdaa:1:c9f6:a7b:1be:f615:226a:2" (ID: 1217570335) is registered as primary but running as standby

Notice the “primary running as standby”, that the signal to promote the node back to primary.


postgres@683d527add7458:~$ repmgr standby promote
NOTICE: promoting standby to primary
DETAIL: promoting server "fdaa:1:c9f6:a7b:1be:f615:226a:2" (ID: 1217570335) using pg_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "fdaa:1:c9f6:a7b:1be:f615:226a:2" (ID: 1217570335) was successfully promoted to primary

and now run repmgr cluster show to be sure it is working

postgres@683d527add7458:~$ repmgr cluster show
 ID         | Name           | Role    | Status    | Upstream       | Location | Priority | Timeline | Connection string
------------+----------------+---------+-----------+----------------+----------+----------+----------+--------------------------------------------------------------------------------------------------------
 520037516  | 683d529c4d05d8 | standby |   running | 683d527add7458 | cdg      | 100      | 1        | host=683d529c4d05d8.vm.nutricion-api-db.internal port=5433 user=repmgr dbname=repmgr connect_timeout=5
 1217570335 | 683d527add7458 | primary | * running |                | cdg      | 100      | 2        | host=683d527add7458.vm.nutricion-api-db.internal port=5433 user=repmgr dbname=repmgr connect_timeout=5

system · October 29, 2025, 2:16pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.