Random postgres-flex disconnects

I’ve migrated my pg cluster to the new postgres-flex stack (3 nodes for HA), now seeing some disconnects (none led to downtime, but some background jobs crashed)

in my pg logs

deadMemberMonitorTick failed with: no rows in result set
deadMemberMonitorTick failed with: no rows in result set

LOG:  recovery restart point at 3/D0031548
DETAIL:  Last completed transaction was at log time 2023-02-27 12:46:36.649855+00.

LOG:  restartpoint starting: time
LOG:  restartpoint complete: wrote 4 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.302 s, sync=0.002 s, total=0.305 s; sync files=3, longest=0.001 s, average=0.001 s; distance=1 kB, estimate=298 kB
LOG:  recovery restart point at 3/D0031548
DETAIL:  Last completed transaction was at log time 2023-02-27 12:46:47.894142+00.

LOG:  PID 28254 in cancel request did not match any process
Starting machine
machine became reachable in 11.131316ms
LOG:  PID 28253 in cancel request did not match any process
monitoring primary node "fdaa:" (ID: ) in normal state

Starting machine
LOG:  PID 28257 in cancel request did not match any process
Starting machine
machine became reachable in 13.610528ms
LOG:  PID 28258 in cancel request did not match any process
machine became reachable in 10.41391ms
LOG:  restartpoint starting: time
...
Starting machine
Starting machine
LOG:  PID 28259 in cancel request did not match any process
LOG:  PID 7802 in cancel request did not match any process
machine became reachable in 25.858475ms
machine became reachable in 26.533053ms

etc
`

Hey @Elder ,

Thanks for posting this.

deadMemberMonitorTick failed with: no rows in result set

^^^ This issue has been fixed and will be included in the next release going out a bit later today.

LOG:  restartpoint starting: time
LOG:  restartpoint complete: wrote 4 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.302 s, sync=0.002 s, total=0.305 s; sync files=3, longest=0.001 s, average=0.001 s; distance=1 kB, estimate=298 kB
LOG:  recovery restart point at 3/D0031548
DETAIL:  Last completed transaction was at log time 2023-02-27 12:46:47.894142+00.

^^^ This is all normal.

Starting machine
Starting machine
LOG:  PID 28259 in cancel request did not match any process
LOG:  PID 7802 in cancel request did not match any process
machine became reachable in 25.858475ms
machine became reachable in 26.533053ms

^^^ This actually has to do with the fly-proxy. I am working with the proxy team to see what might be going on there.

2 Likes

Hello,

I’m still having same issue

2023-03-15T07:02:53.885 app[4d8902efe17348] mad [info] monitor | 2023/03/15 07:02:53 deadMemberMonitorTick failed with: no rows in result set

Do I have to do something to apply the fix?

Thank you.

@dsupernormal you can check if you are on the latest postgres-flex version by running

fly image show -a app-db

And then update if needed by

fly image update -a app-db
2 Likes