Postgres clusters periodically down across many of our organizations

Hi everyone,
we have some elixir/phoenix apps deployed on fly with apm monitoring via Appsignal and sometimes we get strange critical errors related to Postgres with the apps going down.

Until now we had not been able to get the logs in time (given the 100 lines limit of flyctl) and we did not well understand the problem, in fact we were thinking of setting up the logs shipping to Logdna or similar for better understandings.

There was an event this night and a new one right now, on almost all of our fly organizations.

This time we were able to copy these logs from one of our postgres clusters:

2022-09-21T14:20:05.404 app[e5de3968] fra [info] sentinel | 2022-09-21T14:20:05.403Z WARN cmd/sentinel.go:276 no keeper info available {"db": "16ffce98", "keeper": "23c40f8092"}
2022-09-21T14:20:14.593 app[e5de3968] fra [info] keeper | 2022-09-21 14:20:14.592 UTC [20823] LOG: started streaming WAL from primary at F/3B000000 on timeline 15
2022-09-21T14:20:22.777 app[e5de3968] fra [info] proxy | [WARNING] 263/142022 (565) : Server bk_db/pg1 is UP, reason: Layer7 check passed, code: 200, check duration: 15ms. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
2022-09-21T14:20:23.146 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:23.145Z ERROR cmd/keeper.go:1041 error retrieving cluster data {"error": "Unexpected response code: 500"}
2022-09-21T14:20:28.198 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:28.198Z INFO cmd/keeper.go:1504 our db requested role is master
2022-09-21T14:20:28.199 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:28.199Z INFO cmd/keeper.go:1536 promoting to master
2022-09-21T14:20:28.201 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:28.199Z INFO postgresql/postgresql.go:532 promoting database
2022-09-21T14:20:28.204 app[e5de3968] fra [info] keeper | 2022-09-21 14:20:28.201 UTC [13105] LOG: received promote request
2022-09-21T14:20:28.204 app[e5de3968] fra [info] keeper | 2022-09-21 14:20:28.201 UTC [20823] FATAL: terminating walreceiver process due to administrator command
2022-09-21T14:20:28.205 app[e5de3968] fra [info] keeper | 2022-09-21 14:20:28.204 UTC [13105] LOG: record with incorrect prev-link 6D616E5F/6E6F7263 at F/3B50F290
2022-09-21T14:20:28.205 app[e5de3968] fra [info] keeper | 2022-09-21 14:20:28.204 UTC [13105] LOG: redo done at F/3B50F268 system usage: CPU: user: 4.10 s, system: 5.35 s, elapsed: 29612.14 s
2022-09-21T14:20:28.205 app[e5de3968] fra [info] keeper | 2022-09-21 14:20:28.204 UTC [13105] LOG: last completed transaction was at log time 2022-09-21 14:20:25.448896+00
2022-09-21T14:20:28.208 app[e5de3968] fra [info] keeper | waiting for server to promote....2022-09-21 14:20:28.208 UTC [13105] LOG: selected new timeline ID: 16
2022-09-21T14:20:28.245 app[e5de3968] fra [info] keeper | 2022-09-21 14:20:28.245 UTC [13105] LOG: archive recovery complete
2022-09-21T14:20:28.253 app[e5de3968] fra [info] keeper | 2022-09-21 14:20:28.251 UTC [13104] LOG: database system is ready to accept connections
2022-09-21T14:20:28.306 app[e5de3968] fra [info] keeper | done
2022-09-21T14:20:28.306 app[e5de3968] fra [info] keeper | server promoted
2022-09-21T14:20:28.336 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:28.336Z INFO cmd/keeper.go:1675 postgres parameters not changed
2022-09-21T14:20:28.337 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:28.336Z INFO cmd/keeper.go:1702 postgres hba entries not changed
2022-09-21T14:20:32.095 app[e5de3968] fra [info] proxy | [WARNING] 263/142032 (565) : Server bk_db/pg2 is UP, reason: Layer7 check passed, code: 200, check duration: 7ms. 2 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
2022-09-21T14:20:33.383 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:33.382Z INFO cmd/keeper.go:1504 our db requested role is master
2022-09-21T14:20:33.384 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:33.383Z INFO cmd/keeper.go:1542 already master
2022-09-21T14:20:33.405 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:33.405Z INFO cmd/keeper.go:1675 postgres parameters not changed
2022-09-21T14:20:33.406 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:33.405Z INFO cmd/keeper.go:1702 postgres hba entries not changed
2022-09-21T14:20:35.457 app[e5de3968] fra [info] keeper | 2022-09-21 14:20:35.456 UTC [20965] LOG: PID 23527 in cancel request did not match any process
2022-09-21T14:20:36.811 app[e5de3968] fra [info] proxy | [WARNING] 263/142036 (565) : Server bk_db/pg1 is DOWN, reason: Layer4 timeout, check duration: 2001ms. 1 active and 0 backup servers left. 18 sessions active, 0 requeued, 0 remaining in queue.
2022-09-21T14:20:36.814 app[e5de3968] fra [info] keeper | 2022-09-21 14:20:36.813 UTC [20973] LOG: PID 23523 in cancel request did not match any process
2022-09-21T14:20:38.456 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:38.455Z INFO cmd/keeper.go:1504 our db requested role is master
2022-09-21T14:20:38.457 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:38.456Z INFO cmd/keeper.go:1542 already master
2022-09-21T14:20:38.478 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:38.478Z INFO cmd/keeper.go:1675 postgres parameters not changed
2022-09-21T14:20:38.478 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:38.478Z INFO cmd/keeper.go:1702 postgres hba entries not changed
2022-09-21T14:20:42.491 app[5cd3fcb6] fra [info] keeper | 2022-09-21 14:20:41.746 UTC [23542] LOG: could not send data to client: Connection reset by peer
2022-09-21T14:20:42.491 app[5cd3fcb6] fra [info] keeper | 2022-09-21 14:20:41.746 UTC [23542] FATAL: connection to client lost
2022-09-21T14:20:42.491 app[5cd3fcb6] fra [info] keeper | 2022-09-21 14:20:41.749 UTC [23540] LOG: could not send data to client: Connection reset by peer
2022-09-21T14:20:42.491 app[5cd3fcb6] fra [info] keeper | 2022-09-21 14:20:41.749 UTC [23540] FATAL: connection to client lost
2022-09-21T14:20:43.532 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:43.531Z INFO cmd/keeper.go:1504 our db requested role is master
2022-09-21T14:20:43.533 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:43.532Z INFO cmd/keeper.go:1542 already master
2022-09-21T14:20:43.564 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:43.564Z INFO cmd/keeper.go:1675 postgres parameters not changed
2022-09-21T14:20:43.565 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:43.564Z INFO cmd/keeper.go:1702 postgres hba entries not changed
2022-09-21T14:20:45.043 app[e5de3968] fra [info] sentinel | 2022-09-21T14:20:45.042Z INFO cmd/sentinel.go:89 sentinel leadership acquired
2022-09-21T14:20:45.266 app[e5de3968] fra [info] proxy | [WARNING] 263/142045 (565) : Server bk_db/pg1 is UP, reason: Layer7 check passed, code: 200, check duration: 32ms. 2 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
2022-09-21T14:20:48.644 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:48.643Z INFO cmd/keeper.go:1504 our db requested role is master
2022-09-21T14:20:48.645 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:48.644Z INFO cmd/keeper.go:1542 already master
2022-09-21T14:20:48.665 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:48.665Z INFO cmd/keeper.go:1675 postgres parameters not changed
2022-09-21T14:20:48.665 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:48.665Z INFO cmd/keeper.go:1702 postgres hba entries not changed
2022-09-21T14:20:49.942 app[5cd3fcb6] fra [info] sentinel | 2022-09-21T14:20:49.938Z INFO cmd/sentinel.go:82 Trying to acquire sentinels leadership
2022-09-21T14:20:53.720 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:53.719Z INFO cmd/keeper.go:1504 our db requested role is master
2022-09-21T14:20:53.721 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:53.721Z INFO cmd/keeper.go:1542 already master
2022-09-21T14:20:53.750 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:53.750Z INFO cmd/keeper.go:1675 postgres parameters not changed
2022-09-21T14:20:53.751 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:53.750Z INFO cmd/keeper.go:1702 postgres hba entries not changed
2022-09-21T14:20:54.941 app[5cd3fcb6] fra [info] sentinel | 2022-09-21T14:20:54.941Z ERROR cmd/sentinel.go:102 election loop error {"error": "Put \"https://consul-fra.fly-shared.net/v1/session/create?wait=5000ms\": dial tcp: lookup consul-fra.fly-shared.net on [fdaa::3]:53: read udp [fdaa:0:650e:a7b:23c4:0:f809:2]:54944->[fdaa::3]:53: i/o timeout"}
2022-09-21T14:20:54.941 app[5cd3fcb6] fra [info] sentinel | 2022-09-21T14:20:54.941Z ERROR cmd/sentinel.go:1843 error retrieving cluster data {"error": "Get \"https://consul-fra.fly-shared.net/v1/kv/viggianicod-app-stag-db-ejpon17kvmy9dgr4/viggianicod-app-stag-db/clusterdata?consistent=&wait=5000ms\": dial tcp: lookup consul-fra.fly-shared.net on [fdaa::3]:53: read udp [fdaa:0:650e:a7b:23c4:0:f809:2]:54944->[fdaa::3]:53: i/o timeout"}
2022-09-21T14:20:54.942 app[5cd3fcb6] fra [info] keeper | 2022-09-21T14:20:54.942Z ERROR cmd/keeper.go:1041 error retrieving cluster data {"error": "Get \"https://consul-fra.fly-shared.net/v1/kv/viggianicod-app-stag-db-ejpon17kvmy9dgr4/viggianicod-app-stag-db/clusterdata?consistent=&wait=5000ms\": dial tcp: lookup consul-fra.fly-shared.net on [fdaa::3]:53: read udp [fdaa:0:650e:a7b:23c4:0:f809:2]:39566->[fdaa::3]:53: i/o timeout"}
2022-09-21T14:20:54.943 app[5cd3fcb6] fra [info] keeper | 2022-09-21T14:20:54.942Z ERROR cmd/keeper.go:870 failed to update keeper info {"error": "cannot set or renew session for ttl, unable to operate on sessions"}
2022-09-21T14:20:57.245 app[e5de3968] fra [info] sentinel | 2022-09-21T14:20:57.244Z WARN cmd/sentinel.go:276 no keeper info available {"db": "16ffce98", "keeper": "23c40f8092"}
2022-09-21T14:20:57.249 app[e5de3968] fra [info] sentinel | 2022-09-21T14:20:57.249Z INFO cmd/sentinel.go:1151 removing old master db {"db": "16ffce98", "keeper": "23c40f8092"}
2022-09-21T14:20:57.250 app[e5de3968] fra [info] sentinel | 2022-09-21T14:20:57.249Z INFO cmd/sentinel.go:1503 added new standby db {"db": "9ddb26d0", "keeper": "23c40f8092"}
2022-09-21T14:20:58.813 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:58.812Z INFO cmd/keeper.go:1504 our db requested role is master
2022-09-21T14:20:58.814 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:58.813Z INFO cmd/keeper.go:1542 already master
2022-09-21T14:20:58.821 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:58.820Z INFO cmd/keeper.go:1007 creating replication slot {"slot": "stolon_9ddb26d0"}
2022-09-21T14:20:58.851 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:58.849Z INFO cmd/keeper.go:1675 postgres parameters not changed
2022-09-21T14:20:58.851 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:58.850Z INFO cmd/keeper.go:1702 postgres hba entries not changed
2022-09-21T14:21:02.937 app[e5de3968] fra [info] sentinel | 2022-09-21T14:21:02.936Z WARN cmd/sentinel.go:276 no keeper info available {"db": "9ddb26d0", "keeper": "23c40f8092"}
2022-09-21T14:21:04.009 app[e5de3968] fra [info] keeper | 2022-09-21T14:21:04.007Z INFO cmd/keeper.go:1504 our db requested role is master
2022-09-21T14:21:04.021 app[e5de3968] fra [info] keeper | 2022-09-21T14:21:04.020Z INFO cmd/keeper.go:1542 already master
2022-09-21T14:21:04.144 app[e5de3968] fra [info] keeper | 2022-09-21T14:21:04.143Z INFO cmd/keeper.go:1675 postgres parameters not changed
2022-09-21T14:21:04.149 app[e5de3968] fra [info] keeper | 2022-09-21T14:21:04.143Z INFO cmd/keeper.go:1702 postgres hba entries not changed
2022-09-21T14:21:04.943 app[5cd3fcb6] fra [info] sentinel | 2022-09-21T14:21:04.942Z INFO cmd/sentinel.go:82 Trying to acquire sentinels leadership
2022-09-21T14:21:06.655 app[5cd3fcb6] fra [info] proxy | [WARNING] 263/142106 (565) : Server bk_db/pg1 ('fra.viggianicod-app-stag-db.internal') is UP/READY (resolves again).
2022-09-21T14:21:06.925 app[5cd3fcb6] fra [info] sentinel | 2022-09-21T14:21:06.925Z ERROR cmd/sentinel.go:102 election loop error {"error": "Put \"https://consul-fra.fly-shared.net/v1/session/create?wait=5000ms\": EOF"}
2022-09-21T14:21:09.958 app[5cd3fcb6] fra [info] proxy | [WARNING] 263/142109 (565) : Server bk_db/pg1 is DOWN, reason: Layer4 timeout, check duration: 2001ms. 1 active and 0 backup servers left. 4 sessions active, 0 requeued, 0 remaining in queue.
2022-09-21T14:21:11.934 app[5cd3fcb6] fra [info] sentinel | 2022-09-21T14:21:11.933Z ERROR cmd/sentinel.go:1843 error retrieving cluster data {"error": "Get \"https://consul-fra.fly-shared.net/v1/kv/viggianicod-app-stag-db-ejpon17kvmy9dgr4/viggianicod-app-stag-db/clusterdata?consistent=&wait=5000ms\": read tcp [2604:1380:4091:3601::f809:3]:40818->[2a09:8280:1::3:c1f]:443: read: connection reset by peer"}
2022-09-21T14:21:11.949 app[5cd3fcb6] fra [info] keeper | 2022-09-21T14:21:11.936Z INFO cmd/keeper.go:1140 current db UID different than cluster data db UID {"db": "16ffce98", "cdDB": "9ddb26d0"}
2022-09-21T14:21:11.959 app[5cd3fcb6] fra [info] keeper | 2022-09-21T14:21:11.958Z INFO cmd/keeper.go:1295 resyncing the database cluster
2022-09-21T14:21:12.006 app[5cd3fcb6] fra [info] keeper | 2022-09-21 14:21:12.004 UTC [23636] FATAL: canceling authentication due to timeout
2022-09-21T14:21:12.009 app[5cd3fcb6] fra [info] keeper | 2022-09-21 14:21:12.009 UTC [23629] FATAL: terminating connection due to administrator command
2022-09-21T14:21:12.027 app[5cd3fcb6] fra [info] keeper | 2022-09-21 14:21:12.025 UTC [32039] FATAL: terminating connection due to administrator command
2022-09-21T14:21:12.037 app[5cd3fcb6] fra [info] keeper | 2022-09-21 14:21:12.036 UTC [32018] LOG: background worker "logical replication launcher" (PID 2060) exited with exit code 1
2022-09-21T14:21:12.078 app[5cd3fcb6] fra [info] keeper | 2022-09-21 14:21:12.078 UTC [32018] LOG: database system is shut down
2022-09-21T14:21:12.803 app[5cd3fcb6] fra [info] keeper | 2022-09-21T14:21:12.803Z INFO cmd/keeper.go:924 syncing from followed db {"followedDB": "a1fb9996", "keeper": "23c30f8082"}
2022-09-21T14:21:12.803 app[5cd3fcb6] fra [info] keeper | 2022-09-21T14:21:12.803Z INFO postgresql/postgresql.go:964 running pg_basebackup
2022-09-21T14:21:12.888 app[e5de3968] fra [info] keeper | 2022-09-21 14:21:12.880 UTC [21174] LOG: PID 23583 in cancel request did not match any process
2022-09-21T14:21:12.953 app[e5de3968] fra [info] keeper | 2022-09-21T14:21:12.940Z INFO cmd/keeper.go:1504 our db requested role is master
2022-09-21T14:21:12.953 app[e5de3968] fra [info] keeper | 2022-09-21T14:21:12.941Z INFO cmd/keeper.go:1542 already master
2022-09-21T14:21:12.983 app[e5de3968] fra [info] keeper | 2022-09-21T14:21:12.968Z INFO cmd/keeper.go:1675 postgres parameters not changed
2022-09-21T14:21:12.983 app[e5de3968] fra [info] keeper | 2022-09-21T14:21:12.968Z INFO cmd/keeper.go:1702 postgres hba entries not changed
2022-09-21T14:21:13.084 app[5cd3fcb6] fra [info] keeper | pg_basebackup: initiating base backup, waiting for checkpoint to complete
2022-09-21T14:21:13.128 app[e5de3968] fra [info] sentinel | 2022-09-21T14:21:13.128Z WARN cmd/sentinel.go:287 received db state for unexpected db uid {"receivedDB": "16ffce98", "db": "9ddb26d0", "keeper": "23c40f8092"}
2022-09-21T14:21:13.413 app[e5de3968] fra [info] keeper | 2022-09-21 14:21:13.411 UTC [21189] LOG: could not send data to client: Connection reset by peer
2022-09-21T14:21:13.413 app[e5de3968] fra [info] keeper | 2022-09-21 14:21:13.411 UTC [21189] FATAL: connection to client lost
2022-09-21T14:21:13.556 app[e5de3968] fra [info] keeper | 2022-09-21 14:21:13.554 UTC [21190] LOG: could not send data to client: Connection reset by peer
2022-09-21T14:21:13.556 app[e5de3968] fra [info] keeper | 2022-09-21 14:21:13.554 UTC [21190] FATAL: connection to client lost
2022-09-21T14:21:13.819 app[e5de3968] fra [info] keeper | 2022-09-21 14:21:13.818 UTC [21196] LOG: could not send data to client: Connection reset by peer
2022-09-21T14:21:13.819 app[e5de3968] fra [info] keeper | 2022-09-21 14:21:13.818 UTC [21196] FATAL: connection to client lost
2022-09-21T14:21:13.826 app[e5de3968] fra [info] keeper | 2022-09-21 14:21:13.824 UTC [21197] LOG: could not send data to client: Connection reset by peer
2022-09-21T14:21:13.826 app[e5de3968] fra [info] keeper | 2022-09-21 14:21:13.824 UTC [21197] FATAL: connection to client lost
2022-09-21T14:21:13.993 app[5cd3fcb6] fra [info] exporter | ERRO[941926] Error opening connection to database (postgresql://flypgadmin:PASSWORD_REMOVED@[fdaa:0:650e:a7b:23c4:0:f809:2]:5433/postgres?sslmode=disable): driver: bad connection source="postgres_exporter.go:1658"
2022-09-21T14:21:17.009 app[5cd3fcb6] fra [info] keeper | 53481/7998581 kB (0%), 0/1 tablespace (/data/postgres/base/16386/25123 )
2022-09-21T14:21:18.003 app[5cd3fcb6] fra [info] keeper | 88777/7998581 kB (1%), 0/1 tablespace (/data/postgres/base/16386/25123 )
2022-09-21T14:21:18.443 app[e5de3968] fra [info] keeper | 2022-09-21T14:21:18.437Z INFO cmd/keeper.go:1504 our db requested role is master
2022-09-21T14:21:18.443 app[e5de3968] fra [info] keeper | 2022-09-21T14:21:18.438Z INFO cmd/keeper.go:1542 already master
2022-09-21T14:21:18.463 app[e5de3968] fra [info] proxy | [WARNING] 263/142118 (565) : Server bk_db/pg1 is DOWN, reason: Layer7 timeout, check duration: 5002ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2022-09-21T14:21:18.517 app[e5de3968] fra [info] sentinel | 2022-09-21T14:21:18.497Z WARN cmd/sentinel.go:276 no keeper info available {"db": "a1fb9996", "keeper": "23c30f8082"}
2022-09-21T14:21:18.605 app[e5de3968] fra [info] keeper | 2022-09-21T14:21:18.603Z INFO cmd/keeper.go:1675 postgres parameters not changed
2022-09-21T14:21:18.605 app[e5de3968] fra [info] keeper | 2022-09-21T14:21:18.603Z INFO cmd/keeper.go:1702 postgres hba entries not changed
2022-09-21T14:21:19.011 app[5cd3fcb6] fra [info] keeper | 128585/7998581 kB (1%), 0/1 tablespace (/data/postgres/base/16386/25123 )
2022-09-21T14:21:20.013 app[5cd3fcb6] fra [info] keeper | 227401/7998581 kB (2%), 0/1 tablespace (/data/postgres/base/16386/25123 )
2022-09-21T14:21:21.007 app[5cd3fcb6] fra [info] keeper | 284233/7998581 kB (3%), 0/1 tablespace (/data/postgres/base/16386/25123 )
2022-09-21T14:21:21.964 app[5cd3fcb6] fra [info] sentinel | 2022-09-21T14:21:21.961Z ERROR cmd/sentinel.go:1843 error retrieving cluster data {"error": "Get \"https://consul-fra.fly-shared.net/v1/kv/viggianicod-app-stag-db-ejpon17kvmy9dgr4/viggianicod-app-stag-db/clusterdata?consistent=&wait=5000ms\": EOF"}
2022-09-21T14:21:22.019 app[5cd3fcb6] fra [info] keeper | 323145/7998581 kB (4%), 0/1 tablespace (/data/postgres/base/16386/25123 )
2022-09-21T14:21:23.013 app[5cd3fcb6] fra [info] keeper | 360937/7998581 kB (4%), 0/1 tablespace (/data/postgres/base/16386/25123 )
2022-09-21T14:21:24.010 app[5cd3fcb6] fra [info] keeper | 393193/7998581 kB (4%), 0/1 tablespace (/data/postgres/base/16386/25123 )
2022-09-21T14:21:24.776 app[e5de3968] fra [info] keeper | 2022-09-21T14:21:24.776Z INFO cmd/keeper.go:1504 our db requested role is master
2022-09-21T14:21:24.777 app[e5de3968] fra [info] keeper | 2022-09-21T14:21:24.777Z INFO cmd/keeper.go:1542 already master
2022-09-21T14:21:24.805 app[e5de3968] fra [info] keeper | 2022-09-21T14:21:24.805Z INFO cmd/keeper.go:1675 postgres parameters not changed
2022-09-21T14:21:24.806 app[e5de3968] fra [info] keeper | 2022-09-21T14:21:24.805Z INFO cmd/keeper.go:1702 postgres hba entries not changed
2022-09-21T14:21:25.007 app[5cd3fcb6] fra [info] keeper | 418377/7998581 kB (5%), 0/1 tablespace (/data/postgres/base/16386/25123 )

And these are the errors reporteds by Appsignal:

** (FunctionClauseError) no function clause matching in Postgrex.Protocol.msg_decode/1


Backtrace:
lib/postgrex/protocol.ex:3130 Postgrex.Protocol.msg_decode/1	
lib/postgrex/protocol.ex:3101 Postgrex.Protocol.msg_recv/3	
lib/postgrex/protocol.ex:1071 Postgrex.Protocol.recv_simple/6	
lib/postgrex/simple_connection.ex:376 Postgrex.SimpleConnection.handle/5	
lib/connection.ex:488 Connection.handle_call/3	
gen_server.erl:721 :gen_server.try_handle_call/4	
gen_server.erl:750 :gen_server.handle_msg/6	
proc_lib.erl:226 :proc_lib.init_p_do_apply/3
** (ErlangError) Erlang error: {{%DBConnection.ConnectionError{message: "connection not available and request was dropped from queue after 1980ms. This means requests are coming in and your connection pool cannot serve them fast enough. You can address this by:\n\n  1. Ensuring your database is available and that you can connect to it\n  2. Tracking down slow queries and making sure they are running fast enough\n  3. Increasing the pool_size (although this increases resource consumption)\n  4. Allowing requests to wait longer by increasing :queue_target and :queue_interval\n\nSee DBConnection.start_link/2 for more information\n", reason: :queue_timeout, severity: :error}, [{DBConnection, :transaction, 3, [file: 'lib/db_connection.ex', line: 904, error_info: %{module: Exception}]}, {Oban.Peers.Postgres, :handle_info, 2, [file: 'lib/oban/peers/postgres.ex', line: 85]}, {:gen_server, :try_dispatch, 4, [file: 'gen_server.erl', line: 695]}, {:gen_server, :loop, 7, [file: 'gen_server.erl', line: 437]}, {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 226]}]}, {GenServer, :call, [#PID<0.442.57>, :leader?, 5000]}}

Backtrace:
lib/gen_server.ex:1030 GenServer.call/3	
lib/oban/peer.ex:90 Oban.Peer.leader?/2	
lib/oban/plugins/stager.ex:119 Oban.Plugins.Stager.check_leadership_and_stage/1	
lib/oban/plugins/stager.ex:106 anonymous fn/2 in Oban.Plugins.Stager.handle_info/2	
/app/deps/telemetry/src/telemetry.erl:320 :telemetry.span/3	
lib/oban/plugins/stager.ex:105 Oban.Plugins.Stager.handle_info/2	
gen_server.erl:695 :gen_server.try_dispatch/4	
gen_server.erl:771 :gen_server.handle_msg/6
** (Postgrex.Error) FATAL 57P01 (admin_shutdown) terminating connection due to administrator command


Backtrace:
lib/db_connection.ex:904 DBConnection.transaction/3	
lib/oban/plugins/stager.ex:106 anonymous fn/2 in Oban.Plugins.Stager.handle_info/2	
/app/deps/telemetry/src/telemetry.erl:320 :telemetry.span/3	
lib/oban/plugins/stager.ex:105 Oban.Plugins.Stager.handle_info/2	
gen_server.erl:695 :gen_server.try_dispatch/4	
gen_server.erl:771 :gen_server.handle_msg/6	
proc_lib.erl:226 :proc_lib.init_p_do_apply/3
** (DBConnection.ConnectionError) connection not available and request was dropped from queue after 2788ms. This means requests are coming in and your connection pool cannot serve them fast enough. You can address this by:

  1. Ensuring your database is available and that you can connect to it
  2. Tracking down slow queries and making sure they are running fast enough
  3. Increasing the pool_size (although this increases resource consumption)
  4. Allowing requests to wait longer by increasing :queue_target and :queue_interval

See DBConnection.start_link/2 for more information


Backtrace:
lib/db_connection.ex:904 DBConnection.transaction/3	
lib/oban/peers/postgres.ex:85 Oban.Peers.Postgres.handle_info/2	
gen_server.erl:695 :gen_server.try_dispatch/4	
gen_server.erl:771 :gen_server.handle_msg/6	
proc_lib.erl:226 :proc_lib.init_p_do_apply/3
** (DBConnection.ConnectionError) connection not available and request was dropped from queue after 259ms. This means requests are coming in and your connection pool cannot serve them fast enough. You can address this by:

  1. Ensuring your database is available and that you can connect to it
  2. Tracking down slow queries and making sure they are running fast enough
  3. Increasing the pool_size (although this increases resource consumption)
  4. Allowing requests to wait longer by increasing :queue_target and :queue_interval

See DBConnection.start_link/2 for more information


Backtrace:
lib/ecto/adapters/sql.ex:932 Ecto.Adapters.SQL.raise_sql_call_error/1	
lib/ecto/adapters/sql.ex:847 Ecto.Adapters.SQL.execute/6	
lib/ecto/repo/queryable.ex:221 Ecto.Repo.Queryable.execute/4	
lib/ecto/repo/queryable.ex:19 Ecto.Repo.Queryable.all/3	
lib/ecto/repo/queryable.ex:147 Ecto.Repo.Queryable.one/3	
lib/contexts/users/users.ex:563 Viggianicod.Users.get_session_by_token/1	
lib/application/plugs.ex:41 Viggianicod.Plugs.fetch_user_session/2	
(viggianicod 2.4.0) Viggianicod.Router.authentication/2

Some of these apps are already in production and we’re planning to release soon this one, so we will need to understand what is happening and how to fix it.

Can somebody help us?

Maybe related to this ?