I’ve been getting intermittent PG::ConnectionBad errors every 1-3 minutes:
/rails/vendor/bundle/ruby/3.2.0/gems/activerecord-7.0.4.3/lib/active_record/connection_adapters/postgresql_adapter.rb:768:in `exec_params': PG::ConnectionBad: PQconsumeInput() server closed the connection unexpectedly (ActiveRecord::StatementInvalid)
This probably means the server terminated abnormally
before or while processing the request.
I’ve tried increasing the ActiveRecord settings people suggest (reaping_frequency, idle_timeout, pool size of 1 etc) with no change. I’ve also tried bypassing flycast as suggested here Postgres dropping connection after 30s when using LISTEN/NOTIFY - #2 by MatthewIngwersen, but also no change.
After 1-3 minutes of activity, the connection always dies. Here’s an example script:
start_time = Time.now
puts "Started at #{start_time}"
while(true) do
sleep 10
elapsed_time = Time.now - start_time
minutes = (elapsed_time / 60).to_i
seconds = (elapsed_time % 60).to_i
User.count
puts "#{Time.now} Connection active for #{minutes}min#{seconds}s"
end
And logs:
This is about the same time when the Rails process got disconnected:
2024-02-17T00:11:51Z app[90806271c6ded8] syd [info]monitor | Voting member(s): 3, Active: 3, Inactive: 0, Conflicts: 2
2024-02-17T00:11:51Z app[90806271c6ded8] syd [info]proxy | [NOTICE] (4181) : haproxy version is 2.8.3-1~bpo12+1
2024-02-17T00:11:51Z app[90806271c6ded8] syd [info]proxy | [NOTICE] (4181) : path to executable is /usr/sbin/haproxy
2024-02-17T00:11:51Z app[90806271c6ded8] syd [info]proxy | [ALERT] (4181) : Current worker (4183) exited with code 143 (Terminated)
2024-02-17T00:11:51Z app[90806271c6ded8] syd [info]proxy | [WARNING] (4181) : All workers exited. Exiting... (0)
2024-02-17T00:11:51Z app[90806271c6ded8] syd [info]proxy | Process exited 0
2024-02-17T00:11:51Z app[90806271c6ded8] syd [info]proxy | restarting in 1s [attempt 6042]
2024-02-17T00:11:51Z app[3d8d7de1fe5528] syd [info]proxy | [NOTICE] (2951) : haproxy version is 2.8.3-1~bpo12+1
2024-02-17T00:11:51Z app[3d8d7de1fe5528] syd [info]proxy | [NOTICE] (2951) : path to executable is /usr/sbin/haproxy
2024-02-17T00:11:51Z app[3d8d7de1fe5528] syd [info]proxy | [ALERT] (2951) : Current worker (2953) exited with code 143 (Terminated)
2024-02-17T00:11:51Z app[3d8d7de1fe5528] syd [info]proxy | [WARNING] (2951) : All workers exited. Exiting... (0)
2024-02-17T00:11:51Z app[3d8d7de1fe5528] syd [info]proxy | Process exited 0
2024-02-17T00:11:51Z app[3d8d7de1fe5528] syd [info]proxy | restarting in 1s [attempt 6041]
2024-02-17T00:11:51Z app[90806271c6ded8] syd [info]monitor | clusterStateMonitorTick failed with: primary has been quarantined: majority of registered members confirmed we are not the real primary
2024-02-17T00:11:51Z app[4d89d024f611e8] syd [info]proxy | Process exited 0
2024-02-17T00:11:51Z app[4d89d024f611e8] syd [info]proxy | restarting in 1s [attempt 6042]
2024-02-17T00:11:51Z app[4d89d024f611e8] syd [info]proxy | [NOTICE] (27893) : haproxy version is 2.8.3-1~bpo12+1
2024-02-17T00:11:51Z app[4d89d024f611e8] syd [info]proxy | [NOTICE] (27893) : path to executable is /usr/sbin/haproxy
2024-02-17T00:11:51Z app[4d89d024f611e8] syd [info]proxy | [ALERT] (27893) : Current worker (27895) exited with code 143 (Terminated)
2024-02-17T00:11:51Z app[4d89d024f611e8] syd [info]proxy | [WARNING] (27893) : All workers exited. Exiting... (0)
2024-02-17T00:11:52Z app[90806271c6ded8] syd [info]proxy | Running...
2024-02-17T00:11:52Z app[3d8d7de1fe5528] syd [info]proxy | Running...
2024-02-17T00:11:52Z app[4d89d024f611e8] syd [info]proxy | Running...
2024-02-17T00:11:52Z app[90806271c6ded8] syd [info]proxy | [NOTICE] (5678) : New worker (5680) forked
2024-02-17T00:11:52Z app[90806271c6ded8] syd [info]proxy | [NOTICE] (5678) : Loading success.
2024-02-17T00:11:52Z app[90806271c6ded8] syd [info]proxy | [WARNING] (5680) : bk_db/pg1 changed its IP from (none) to fdaa:4:e9ed:a7b:249:5801:d1a7:2 by flydns/dns1.
2024-02-17T00:11:52Z app[90806271c6ded8] syd [info]proxy | [WARNING] (5680) : Server bk_db/pg1 ('syd.frdm-db-production.internal') is UP/READY (resolves again).
2024-02-17T00:11:52Z app[90806271c6ded8] syd [info]proxy | [WARNING] (5680) : Server bk_db/pg1 administratively READY thanks to valid DNS answer.
2024-02-17T00:11:52Z app[90806271c6ded8] syd [info]proxy | [WARNING] (5680) : bk_db/pg2 changed its IP from (none) to fdaa:4:e9ed:a7b:231:da56:8b66:2 by DNS cache.
2024-02-17T00:11:52Z app[90806271c6ded8] syd [info]proxy | [WARNING] (5680) : Server bk_db/pg2 ('syd.frdm-db-production.internal') is UP/READY (resolves again).
2024-02-17T00:11:52Z app[90806271c6ded8] syd [info]proxy | [WARNING] (5680) : Server bk_db/pg2 administratively READY thanks to valid DNS answer.
2024-02-17T00:11:52Z app[90806271c6ded8] syd [info]proxy | [WARNING] (5680) : bk_db/pg3 changed its IP from (none) to fdaa:4:e9ed:a7b:231:5cd0:f68a:2 by DNS cache.
2024-02-17T00:11:52Z app[90806271c6ded8] syd [info]proxy | [WARNING] (5680) : Server bk_db/pg3 ('syd.frdm-db-production.internal') is UP/READY (resolves again).
2024-02-17T00:11:52Z app[90806271c6ded8] syd [info]proxy | [WARNING] (5680) : Server bk_db/pg3 administratively READY thanks to valid DNS answer.
2024-02-17T00:11:52Z app[3d8d7de1fe5528] syd [info]proxy | [NOTICE] (4432) : New worker (4434) forked
2024-02-17T00:11:52Z app[3d8d7de1fe5528] syd [info]proxy | [NOTICE] (4432) : Loading success.
2024-02-17T00:11:52Z app[3d8d7de1fe5528] syd [info]proxy | [WARNING] (4434) : bk_db/pg1 changed its IP from (none) to fdaa:4:e9ed:a7b:249:5801:d1a7:2 by flydns/dns1.
2024-02-17T00:11:52Z app[3d8d7de1fe5528] syd [info]proxy | [WARNING] (4434) : Server bk_db/pg1 ('syd.frdm-db-production.internal') is UP/READY (resolves again).
2024-02-17T00:11:52Z app[3d8d7de1fe5528] syd [info]proxy | [WARNING] (4434) : Server bk_db/pg1 administratively READY thanks to valid DNS answer.
2024-02-17T00:11:52Z app[3d8d7de1fe5528] syd [info]proxy | [WARNING] (4434) : bk_db/pg2 changed its IP from (none) to fdaa:4:e9ed:a7b:231:da56:8b66:2 by DNS cache.
2024-02-17T00:11:52Z app[3d8d7de1fe5528] syd [info]proxy | [WARNING] (4434) : Server bk_db/pg2 ('syd.frdm-db-production.internal') is UP/READY (resolves again).
2024-02-17T00:11:52Z app[3d8d7de1fe5528] syd [info]proxy | [WARNING] (4434) : Server bk_db/pg2 administratively READY thanks to valid DNS answer.
2024-02-17T00:11:52Z app[3d8d7de1fe5528] syd [info]proxy | [WARNING] (4434) : bk_db/pg3 changed its IP from (none) to fdaa:4:e9ed:a7b:231:5cd0:f68a:2 by DNS cache.
2024-02-17T00:11:52Z app[3d8d7de1fe5528] syd [info]proxy | [WARNING] (4434) : Server bk_db/pg3 ('syd.frdm-db-production.internal') is UP/READY (resolves again).
2024-02-17T00:11:52Z app[3d8d7de1fe5528] syd [info]proxy | [WARNING] (4434) : Server bk_db/pg3 administratively READY thanks to valid DNS answer.
2024-02-17T00:11:52Z app[4d89d024f611e8] syd [info]proxy | [NOTICE] (29540) : New worker (29542) forked
2024-02-17T00:11:52Z app[4d89d024f611e8] syd [info]proxy | [NOTICE] (29540) : Loading success.
2024-02-17T00:11:52Z app[4d89d024f611e8] syd [info]proxy | [WARNING] (29542) : bk_db/pg1 changed its IP from (none) to fdaa:4:e9ed:a7b:249:5801:d1a7:2 by flydns/dns1.
2024-02-17T00:11:52Z app[4d89d024f611e8] syd [info]proxy | [WARNING] (29542) : Server bk_db/pg1 ('syd.frdm-db-production.internal') is UP/READY (resolves again).
2024-02-17T00:11:52Z app[4d89d024f611e8] syd [info]proxy | [WARNING] (29542) : Server bk_db/pg1 administratively READY thanks to valid DNS answer.
2024-02-17T00:11:52Z app[4d89d024f611e8] syd [info]proxy | [WARNING] (29542) : bk_db/pg2 changed its IP from (none) to fdaa:4:e9ed:a7b:231:da56:8b66:2 by DNS cache.
2024-02-17T00:11:52Z app[4d89d024f611e8] syd [info]proxy | [WARNING] (29542) : Server bk_db/pg2 ('syd.frdm-db-production.internal') is UP/READY (resolves again).
2024-02-17T00:11:52Z app[4d89d024f611e8] syd [info]proxy | [WARNING] (29542) : Server bk_db/pg2 administratively READY thanks to valid DNS answer.
2024-02-17T00:11:52Z app[4d89d024f611e8] syd [info]proxy | [WARNING] (29542) : bk_db/pg3 changed its IP from (none) to fdaa:4:e9ed:a7b:231:5cd0:f68a:2 by DNS cache.
2024-02-17T00:11:52Z app[4d89d024f611e8] syd [info]proxy | [WARNING] (29542) : Server bk_db/pg3 ('syd.frdm-db-production.internal') is UP/READY (resolves again).
2024-02-17T00:11:52Z app[4d89d024f611e8] syd [info]proxy | [WARNING] (29542) : Server bk_db/pg3 administratively READY thanks to valid DNS answer.
2024-02-17T00:11:52Z app[90806271c6ded8] syd [info]proxy | [WARNING] (5680) : Server bk_db/pg2 is DOWN, reason: Layer7 invalid response, info: "HTTP content check did not match", check duration: 5ms. 2 active and 1 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2024-02-17T00:11:52Z app[3d8d7de1fe5528] syd [info]proxy | [WARNING] (4434) : Server bk_db/pg2 is DOWN, reason: Layer7 invalid response, info: "HTTP content check did not match", check duration: 5ms. 2 active and 1 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2024-02-17T00:11:52Z app[4d89d024f611e8] syd [info]proxy | [WARNING] (29542) : Server bk_db/pg2 is DOWN, reason: Layer7 invalid response, info: "HTTP content check did not match", check duration: 3ms. 2 active and 1 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2024-02-17T00:11:54Z app[90806271c6ded8] syd [info]proxy | [WARNING] (5680) : Backup Server bk_db/pg is DOWN, reason: Layer7 invalid response, info: "HTTP content check did not match", check duration: 3ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2024-02-17T00:11:54Z app[3d8d7de1fe5528] syd [info]proxy | [WARNING] (4434) : Backup Server bk_db/pg is DOWN, reason: Layer7 invalid response, info: "HTTP content check did not match", check duration: 4ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2024-02-17T00:11:54Z app[90806271c6ded8] syd [info]proxy | [WARNING] (5680) : Server bk_db/pg1 is DOWN, reason: Layer7 invalid response, info: "HTTP content check did not match", check duration: 3ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2024-02-17T00:11:54Z app[3d8d7de1fe5528] syd [info]proxy | [WARNING] (4434) : Server bk_db/pg1 is DOWN, reason: Layer7 invalid response, info: "HTTP content check did not match", check duration: 3ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2024-02-17T00:11:54Z app[4d89d024f611e8] syd [info]proxy | [WARNING] (29542) : Server bk_db/pg1 is DOWN, reason: Layer7 invalid response, info: "HTTP content check did not match", check duration: 5ms. 1 active and 1 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
fly checks has a failure I that I don’t really understand:
fly checks list -a frdm-db-production
Health Checks for frdm-db-production
NAME | STATUS | MACHINE | LAST UPDATED | OUTPUT
-------*----------*----------------*-------------------*------------------------------------------------------------------------------
pg | passing | 3d8d7de1fe5528 | Feb 13 2024 03:05 | [✓] connections: 10 used, 3 reserved, 300 max (3.66ms)
-------*----------*----------------*-------------------*------------------------------------------------------------------------------
role | passing | 3d8d7de1fe5528 | Feb 13 2024 03:05 | replica
-------*----------*----------------*-------------------*------------------------------------------------------------------------------
vm | passing | 3d8d7de1fe5528 | Feb 13 2024 03:05 | [✓] checkDisk: 29.35 GB (75.1%) free space on /data/ (35.84µs)
| | | | [✓] checkLoad: load averages: 0.00 0.00 0.00 (38.87µs)
| | | | [✓] memory: system spent 0s of the last 60s waiting on memory (31.73µs)
| | | | [✓] cpu: system spent 36ms of the last 60s waiting on cpu (15.53µs)
| | | | [✓] io: system spent 0s of the last 60s waiting on io (11.48µs)
-------*----------*----------------*-------------------*------------------------------------------------------------------------------
pg | passing | 4d89d024f611e8 | Feb 13 2024 03:05 | [✓] connections: 15 used, 3 reserved, 300 max (3.66ms)
| | | | [✓] cluster-locks: No active locks detected (8.2µs)
| | | | [✓] disk-capacity: 24.9% - readonly mode will be enabled at 90.0% (10.47µs)
-------*----------*----------------*-------------------*------------------------------------------------------------------------------
role | passing | 4d89d024f611e8 | Feb 13 2024 03:05 | primary
-------*----------*----------------*-------------------*------------------------------------------------------------------------------
vm | passing | 4d89d024f611e8 | Feb 15 2024 00:05 | [✓] checkDisk: 29.19 GB (74.7%) free space on /data/ (25.84µs)
| | | | [✓] checkLoad: load averages: 0.13 0.33 0.63 (55.17µs)
| | | | [✓] memory: system spent 12ms of the last 60s waiting on memory (26.79µs)
| | | | [✓] cpu: system spent 3.5s of the last 60s waiting on cpu (18.5µs)
| | | | [✓] io: system spent 0s of the last 60s waiting on io (27.63µs)
-------*----------*----------------*-------------------*------------------------------------------------------------------------------
pg | critical | 90806271c6ded8 | Feb 13 2024 03:05 | 500 Internal Server Error
| | | | [✓] connections: 13 used, 3 reserved, 300 max (3.37ms)
| | | | [✗] cluster-locks: `zombie.lock` detected (5.69µs)
| | | | [✓] disk-capacity: 29.3% - readonly mode will be enabled at 90.0% (8.05µs)
-------*----------*----------------*-------------------*------------------------------------------------------------------------------
role | passing | 90806271c6ded8 | Feb 13 2024 03:05 | zombie
-------*----------*----------------*-------------------*------------------------------------------------------------------------------
vm | passing | 90806271c6ded8 | Feb 13 2024 03:05 | [✓] checkDisk: 27.6 GB (70.7%) free space on /data/ (26.53µs)
| | | | [✓] checkLoad: load averages: 0.03 0.08 0.03 (49.31µs)
| | | | [✓] memory: system spent 0s of the last 60s waiting on memory (20.46µs)
| | | | [✓] cpu: system spent 0s of the last 60s waiting on cpu (12.21µs)
| | | | [✓] io: system spent 0s of the last 60s waiting on io (21.43µs)
-------*----------*----------------*-------------------*------------------------------------------------------------------------------