Postgres drops active connection after 1-3 minutes

I’ve been getting intermittent PG::ConnectionBad errors every 1-3 minutes:

/rails/vendor/bundle/ruby/3.2.0/gems/activerecord-7.0.4.3/lib/active_record/connection_adapters/postgresql_adapter.rb:768:in `exec_params': PG::ConnectionBad: PQconsumeInput() server closed the connection unexpectedly (ActiveRecord::StatementInvalid)
	This probably means the server terminated abnormally
	before or while processing the request.

I’ve tried increasing the ActiveRecord settings people suggest (reaping_frequency, idle_timeout, pool size of 1 etc) with no change. I’ve also tried bypassing flycast as suggested here Postgres dropping connection after 30s when using LISTEN/NOTIFY - #2 by MatthewIngwersen, but also no change.

After 1-3 minutes of activity, the connection always dies. Here’s an example script:

start_time = Time.now
puts "Started at #{start_time}"

while(true) do
  sleep 10
  elapsed_time = Time.now - start_time
  minutes = (elapsed_time / 60).to_i
  seconds = (elapsed_time % 60).to_i
  User.count
  puts "#{Time.now} Connection active for #{minutes}min#{seconds}s"
end

And logs:

This is about the same time when the Rails process got disconnected:
2024-02-17T00:11:51Z app[90806271c6ded8] syd [info]monitor  | Voting member(s): 3, Active: 3, Inactive: 0, Conflicts: 2
2024-02-17T00:11:51Z app[90806271c6ded8] syd [info]proxy    | [NOTICE]   (4181) : haproxy version is 2.8.3-1~bpo12+1
2024-02-17T00:11:51Z app[90806271c6ded8] syd [info]proxy    | [NOTICE]   (4181) : path to executable is /usr/sbin/haproxy
2024-02-17T00:11:51Z app[90806271c6ded8] syd [info]proxy    | [ALERT]    (4181) : Current worker (4183) exited with code 143 (Terminated)
2024-02-17T00:11:51Z app[90806271c6ded8] syd [info]proxy    | [WARNING]  (4181) : All workers exited. Exiting... (0)
2024-02-17T00:11:51Z app[90806271c6ded8] syd [info]proxy    | Process exited 0
2024-02-17T00:11:51Z app[90806271c6ded8] syd [info]proxy    | restarting in 1s [attempt 6042]
2024-02-17T00:11:51Z app[3d8d7de1fe5528] syd [info]proxy    | [NOTICE]   (2951) : haproxy version is 2.8.3-1~bpo12+1
2024-02-17T00:11:51Z app[3d8d7de1fe5528] syd [info]proxy    | [NOTICE]   (2951) : path to executable is /usr/sbin/haproxy
2024-02-17T00:11:51Z app[3d8d7de1fe5528] syd [info]proxy    | [ALERT]    (2951) : Current worker (2953) exited with code 143 (Terminated)
2024-02-17T00:11:51Z app[3d8d7de1fe5528] syd [info]proxy    | [WARNING]  (2951) : All workers exited. Exiting... (0)
2024-02-17T00:11:51Z app[3d8d7de1fe5528] syd [info]proxy    | Process exited 0
2024-02-17T00:11:51Z app[3d8d7de1fe5528] syd [info]proxy    | restarting in 1s [attempt 6041]
2024-02-17T00:11:51Z app[90806271c6ded8] syd [info]monitor  | clusterStateMonitorTick failed with: primary has been quarantined: majority of registered members confirmed we are not the real primary
2024-02-17T00:11:51Z app[4d89d024f611e8] syd [info]proxy    | Process exited 0
2024-02-17T00:11:51Z app[4d89d024f611e8] syd [info]proxy    | restarting in 1s [attempt 6042]
2024-02-17T00:11:51Z app[4d89d024f611e8] syd [info]proxy    | [NOTICE]   (27893) : haproxy version is 2.8.3-1~bpo12+1
2024-02-17T00:11:51Z app[4d89d024f611e8] syd [info]proxy    | [NOTICE]   (27893) : path to executable is /usr/sbin/haproxy
2024-02-17T00:11:51Z app[4d89d024f611e8] syd [info]proxy    | [ALERT]    (27893) : Current worker (27895) exited with code 143 (Terminated)
2024-02-17T00:11:51Z app[4d89d024f611e8] syd [info]proxy    | [WARNING]  (27893) : All workers exited. Exiting... (0)
2024-02-17T00:11:52Z app[90806271c6ded8] syd [info]proxy    | Running...
2024-02-17T00:11:52Z app[3d8d7de1fe5528] syd [info]proxy    | Running...
2024-02-17T00:11:52Z app[4d89d024f611e8] syd [info]proxy    | Running...
2024-02-17T00:11:52Z app[90806271c6ded8] syd [info]proxy    | [NOTICE]   (5678) : New worker (5680) forked
2024-02-17T00:11:52Z app[90806271c6ded8] syd [info]proxy    | [NOTICE]   (5678) : Loading success.
2024-02-17T00:11:52Z app[90806271c6ded8] syd [info]proxy    | [WARNING]  (5680) : bk_db/pg1 changed its IP from (none) to fdaa:4:e9ed:a7b:249:5801:d1a7:2 by flydns/dns1.
2024-02-17T00:11:52Z app[90806271c6ded8] syd [info]proxy    | [WARNING]  (5680) : Server bk_db/pg1 ('syd.frdm-db-production.internal') is UP/READY (resolves again).
2024-02-17T00:11:52Z app[90806271c6ded8] syd [info]proxy    | [WARNING]  (5680) : Server bk_db/pg1 administratively READY thanks to valid DNS answer.
2024-02-17T00:11:52Z app[90806271c6ded8] syd [info]proxy    | [WARNING]  (5680) : bk_db/pg2 changed its IP from (none) to fdaa:4:e9ed:a7b:231:da56:8b66:2 by DNS cache.
2024-02-17T00:11:52Z app[90806271c6ded8] syd [info]proxy    | [WARNING]  (5680) : Server bk_db/pg2 ('syd.frdm-db-production.internal') is UP/READY (resolves again).
2024-02-17T00:11:52Z app[90806271c6ded8] syd [info]proxy    | [WARNING]  (5680) : Server bk_db/pg2 administratively READY thanks to valid DNS answer.
2024-02-17T00:11:52Z app[90806271c6ded8] syd [info]proxy    | [WARNING]  (5680) : bk_db/pg3 changed its IP from (none) to fdaa:4:e9ed:a7b:231:5cd0:f68a:2 by DNS cache.
2024-02-17T00:11:52Z app[90806271c6ded8] syd [info]proxy    | [WARNING]  (5680) : Server bk_db/pg3 ('syd.frdm-db-production.internal') is UP/READY (resolves again).
2024-02-17T00:11:52Z app[90806271c6ded8] syd [info]proxy    | [WARNING]  (5680) : Server bk_db/pg3 administratively READY thanks to valid DNS answer.
2024-02-17T00:11:52Z app[3d8d7de1fe5528] syd [info]proxy    | [NOTICE]   (4432) : New worker (4434) forked
2024-02-17T00:11:52Z app[3d8d7de1fe5528] syd [info]proxy    | [NOTICE]   (4432) : Loading success.
2024-02-17T00:11:52Z app[3d8d7de1fe5528] syd [info]proxy    | [WARNING]  (4434) : bk_db/pg1 changed its IP from (none) to fdaa:4:e9ed:a7b:249:5801:d1a7:2 by flydns/dns1.
2024-02-17T00:11:52Z app[3d8d7de1fe5528] syd [info]proxy    | [WARNING]  (4434) : Server bk_db/pg1 ('syd.frdm-db-production.internal') is UP/READY (resolves again).
2024-02-17T00:11:52Z app[3d8d7de1fe5528] syd [info]proxy    | [WARNING]  (4434) : Server bk_db/pg1 administratively READY thanks to valid DNS answer.
2024-02-17T00:11:52Z app[3d8d7de1fe5528] syd [info]proxy    | [WARNING]  (4434) : bk_db/pg2 changed its IP from (none) to fdaa:4:e9ed:a7b:231:da56:8b66:2 by DNS cache.
2024-02-17T00:11:52Z app[3d8d7de1fe5528] syd [info]proxy    | [WARNING]  (4434) : Server bk_db/pg2 ('syd.frdm-db-production.internal') is UP/READY (resolves again).
2024-02-17T00:11:52Z app[3d8d7de1fe5528] syd [info]proxy    | [WARNING]  (4434) : Server bk_db/pg2 administratively READY thanks to valid DNS answer.
2024-02-17T00:11:52Z app[3d8d7de1fe5528] syd [info]proxy    | [WARNING]  (4434) : bk_db/pg3 changed its IP from (none) to fdaa:4:e9ed:a7b:231:5cd0:f68a:2 by DNS cache.
2024-02-17T00:11:52Z app[3d8d7de1fe5528] syd [info]proxy    | [WARNING]  (4434) : Server bk_db/pg3 ('syd.frdm-db-production.internal') is UP/READY (resolves again).
2024-02-17T00:11:52Z app[3d8d7de1fe5528] syd [info]proxy    | [WARNING]  (4434) : Server bk_db/pg3 administratively READY thanks to valid DNS answer.
2024-02-17T00:11:52Z app[4d89d024f611e8] syd [info]proxy    | [NOTICE]   (29540) : New worker (29542) forked
2024-02-17T00:11:52Z app[4d89d024f611e8] syd [info]proxy    | [NOTICE]   (29540) : Loading success.
2024-02-17T00:11:52Z app[4d89d024f611e8] syd [info]proxy    | [WARNING]  (29542) : bk_db/pg1 changed its IP from (none) to fdaa:4:e9ed:a7b:249:5801:d1a7:2 by flydns/dns1.
2024-02-17T00:11:52Z app[4d89d024f611e8] syd [info]proxy    | [WARNING]  (29542) : Server bk_db/pg1 ('syd.frdm-db-production.internal') is UP/READY (resolves again).
2024-02-17T00:11:52Z app[4d89d024f611e8] syd [info]proxy    | [WARNING]  (29542) : Server bk_db/pg1 administratively READY thanks to valid DNS answer.
2024-02-17T00:11:52Z app[4d89d024f611e8] syd [info]proxy    | [WARNING]  (29542) : bk_db/pg2 changed its IP from (none) to fdaa:4:e9ed:a7b:231:da56:8b66:2 by DNS cache.
2024-02-17T00:11:52Z app[4d89d024f611e8] syd [info]proxy    | [WARNING]  (29542) : Server bk_db/pg2 ('syd.frdm-db-production.internal') is UP/READY (resolves again).
2024-02-17T00:11:52Z app[4d89d024f611e8] syd [info]proxy    | [WARNING]  (29542) : Server bk_db/pg2 administratively READY thanks to valid DNS answer.
2024-02-17T00:11:52Z app[4d89d024f611e8] syd [info]proxy    | [WARNING]  (29542) : bk_db/pg3 changed its IP from (none) to fdaa:4:e9ed:a7b:231:5cd0:f68a:2 by DNS cache.
2024-02-17T00:11:52Z app[4d89d024f611e8] syd [info]proxy    | [WARNING]  (29542) : Server bk_db/pg3 ('syd.frdm-db-production.internal') is UP/READY (resolves again).
2024-02-17T00:11:52Z app[4d89d024f611e8] syd [info]proxy    | [WARNING]  (29542) : Server bk_db/pg3 administratively READY thanks to valid DNS answer.
2024-02-17T00:11:52Z app[90806271c6ded8] syd [info]proxy    | [WARNING]  (5680) : Server bk_db/pg2 is DOWN, reason: Layer7 invalid response, info: "HTTP content check did not match", check duration: 5ms. 2 active and 1 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2024-02-17T00:11:52Z app[3d8d7de1fe5528] syd [info]proxy    | [WARNING]  (4434) : Server bk_db/pg2 is DOWN, reason: Layer7 invalid response, info: "HTTP content check did not match", check duration: 5ms. 2 active and 1 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2024-02-17T00:11:52Z app[4d89d024f611e8] syd [info]proxy    | [WARNING]  (29542) : Server bk_db/pg2 is DOWN, reason: Layer7 invalid response, info: "HTTP content check did not match", check duration: 3ms. 2 active and 1 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2024-02-17T00:11:54Z app[90806271c6ded8] syd [info]proxy    | [WARNING]  (5680) : Backup Server bk_db/pg is DOWN, reason: Layer7 invalid response, info: "HTTP content check did not match", check duration: 3ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2024-02-17T00:11:54Z app[3d8d7de1fe5528] syd [info]proxy    | [WARNING]  (4434) : Backup Server bk_db/pg is DOWN, reason: Layer7 invalid response, info: "HTTP content check did not match", check duration: 4ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2024-02-17T00:11:54Z app[90806271c6ded8] syd [info]proxy    | [WARNING]  (5680) : Server bk_db/pg1 is DOWN, reason: Layer7 invalid response, info: "HTTP content check did not match", check duration: 3ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2024-02-17T00:11:54Z app[3d8d7de1fe5528] syd [info]proxy    | [WARNING]  (4434) : Server bk_db/pg1 is DOWN, reason: Layer7 invalid response, info: "HTTP content check did not match", check duration: 3ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2024-02-17T00:11:54Z app[4d89d024f611e8] syd [info]proxy    | [WARNING]  (29542) : Server bk_db/pg1 is DOWN, reason: Layer7 invalid response, info: "HTTP content check did not match", check duration: 5ms. 1 active and 1 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.

fly checks has a failure I that I don’t really understand:

fly checks list -a frdm-db-production      
Health Checks for frdm-db-production
  NAME | STATUS   | MACHINE        | LAST UPDATED      | OUTPUT                                                                       
-------*----------*----------------*-------------------*------------------------------------------------------------------------------
  pg   | passing  | 3d8d7de1fe5528 | Feb 13 2024 03:05 | [✓] connections: 10 used, 3 reserved, 300 max (3.66ms)                       
-------*----------*----------------*-------------------*------------------------------------------------------------------------------
  role | passing  | 3d8d7de1fe5528 | Feb 13 2024 03:05 | replica                                                                      
-------*----------*----------------*-------------------*------------------------------------------------------------------------------
  vm   | passing  | 3d8d7de1fe5528 | Feb 13 2024 03:05 | [✓] checkDisk: 29.35 GB (75.1%) free space on /data/ (35.84µs)               
       |          |                |                   | [✓] checkLoad: load averages: 0.00 0.00 0.00 (38.87µs)                       
       |          |                |                   | [✓] memory: system spent 0s of the last 60s waiting on memory (31.73µs)      
       |          |                |                   | [✓] cpu: system spent 36ms of the last 60s waiting on cpu (15.53µs)          
       |          |                |                   | [✓] io: system spent 0s of the last 60s waiting on io (11.48µs)              
-------*----------*----------------*-------------------*------------------------------------------------------------------------------
  pg   | passing  | 4d89d024f611e8 | Feb 13 2024 03:05 | [✓] connections: 15 used, 3 reserved, 300 max (3.66ms)                       
       |          |                |                   | [✓] cluster-locks: No active locks detected (8.2µs)                          
       |          |                |                   | [✓] disk-capacity: 24.9% - readonly mode will be enabled at 90.0% (10.47µs)  
-------*----------*----------------*-------------------*------------------------------------------------------------------------------
  role | passing  | 4d89d024f611e8 | Feb 13 2024 03:05 | primary                                                                      
-------*----------*----------------*-------------------*------------------------------------------------------------------------------
  vm   | passing  | 4d89d024f611e8 | Feb 15 2024 00:05 | [✓] checkDisk: 29.19 GB (74.7%) free space on /data/ (25.84µs)               
       |          |                |                   | [✓] checkLoad: load averages: 0.13 0.33 0.63 (55.17µs)                       
       |          |                |                   | [✓] memory: system spent 12ms of the last 60s waiting on memory (26.79µs)    
       |          |                |                   | [✓] cpu: system spent 3.5s of the last 60s waiting on cpu (18.5µs)           
       |          |                |                   | [✓] io: system spent 0s of the last 60s waiting on io (27.63µs)              
-------*----------*----------------*-------------------*------------------------------------------------------------------------------
  pg   | critical | 90806271c6ded8 | Feb 13 2024 03:05 | 500 Internal Server Error                                                    
       |          |                |                   | [✓] connections: 13 used, 3 reserved, 300 max (3.37ms)                       
       |          |                |                   | [✗] cluster-locks: `zombie.lock` detected (5.69µs)                           
       |          |                |                   | [✓] disk-capacity: 29.3% - readonly mode will be enabled at 90.0% (8.05µs)   
-------*----------*----------------*-------------------*------------------------------------------------------------------------------
  role | passing  | 90806271c6ded8 | Feb 13 2024 03:05 | zombie                                                                       
-------*----------*----------------*-------------------*------------------------------------------------------------------------------
  vm   | passing  | 90806271c6ded8 | Feb 13 2024 03:05 | [✓] checkDisk: 27.6 GB (70.7%) free space on /data/ (26.53µs)                
       |          |                |                   | [✓] checkLoad: load averages: 0.03 0.08 0.03 (49.31µs)                       
       |          |                |                   | [✓] memory: system spent 0s of the last 60s waiting on memory (20.46µs)      
       |          |                |                   | [✓] cpu: system spent 0s of the last 60s waiting on cpu (12.21µs)            
       |          |                |                   | [✓] io: system spent 0s of the last 60s waiting on io (21.43µs)              
-------*----------*----------------*-------------------*------------------------------------------------------------------------------

I cloned one of the healthy machines and stopped the failing one, and this issue went away.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.