Postgres clusters periodically down across many of our organizations

moissela · September 21, 2022, 3:00pm

Hi everyone,
we have some elixir/phoenix apps deployed on fly with apm monitoring via Appsignal and sometimes we get strange critical errors related to Postgres with the apps going down.

Until now we had not been able to get the logs in time (given the 100 lines limit of flyctl) and we did not well understand the problem, in fact we were thinking of setting up the logs shipping to Logdna or similar for better understandings.

There was an event this night and a new one right now, on almost all of our fly organizations.

This time we were able to copy these logs from one of our postgres clusters:

2022-09-21T14:20:05.404 app[e5de3968] fra [info] sentinel | 2022-09-21T14:20:05.403Z WARN cmd/sentinel.go:276 no keeper info available {"db": "16ffce98", "keeper": "23c40f8092"}
2022-09-21T14:20:14.593 app[e5de3968] fra [info] keeper | 2022-09-21 14:20:14.592 UTC [20823] LOG: started streaming WAL from primary at F/3B000000 on timeline 15
2022-09-21T14:20:22.777 app[e5de3968] fra [info] proxy | [WARNING] 263/142022 (565) : Server bk_db/pg1 is UP, reason: Layer7 check passed, code: 200, check duration: 15ms. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
2022-09-21T14:20:23.146 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:23.145Z ERROR cmd/keeper.go:1041 error retrieving cluster data {"error": "Unexpected response code: 500"}
2022-09-21T14:20:28.198 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:28.198Z INFO cmd/keeper.go:1504 our db requested role is master
2022-09-21T14:20:28.199 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:28.199Z INFO cmd/keeper.go:1536 promoting to master
2022-09-21T14:20:28.201 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:28.199Z INFO postgresql/postgresql.go:532 promoting database
2022-09-21T14:20:28.204 app[e5de3968] fra [info] keeper | 2022-09-21 14:20:28.201 UTC [13105] LOG: received promote request
2022-09-21T14:20:28.204 app[e5de3968] fra [info] keeper | 2022-09-21 14:20:28.201 UTC [20823] FATAL: terminating walreceiver process due to administrator command
2022-09-21T14:20:28.205 app[e5de3968] fra [info] keeper | 2022-09-21 14:20:28.204 UTC [13105] LOG: record with incorrect prev-link 6D616E5F/6E6F7263 at F/3B50F290
2022-09-21T14:20:28.205 app[e5de3968] fra [info] keeper | 2022-09-21 14:20:28.204 UTC [13105] LOG: redo done at F/3B50F268 system usage: CPU: user: 4.10 s, system: 5.35 s, elapsed: 29612.14 s
2022-09-21T14:20:28.205 app[e5de3968] fra [info] keeper | 2022-09-21 14:20:28.204 UTC [13105] LOG: last completed transaction was at log time 2022-09-21 14:20:25.448896+00
2022-09-21T14:20:28.208 app[e5de3968] fra [info] keeper | waiting for server to promote....2022-09-21 14:20:28.208 UTC [13105] LOG: selected new timeline ID: 16
2022-09-21T14:20:28.245 app[e5de3968] fra [info] keeper | 2022-09-21 14:20:28.245 UTC [13105] LOG: archive recovery complete
2022-09-21T14:20:28.253 app[e5de3968] fra [info] keeper | 2022-09-21 14:20:28.251 UTC [13104] LOG: database system is ready to accept connections
2022-09-21T14:20:28.306 app[e5de3968] fra [info] keeper | done
2022-09-21T14:20:28.306 app[e5de3968] fra [info] keeper | server promoted
2022-09-21T14:20:28.336 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:28.336Z INFO cmd/keeper.go:1675 postgres parameters not changed
2022-09-21T14:20:28.337 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:28.336Z INFO cmd/keeper.go:1702 postgres hba entries not changed
2022-09-21T14:20:32.095 app[e5de3968] fra [info] proxy | [WARNING] 263/142032 (565) : Server bk_db/pg2 is UP, reason: Layer7 check passed, code: 200, check duration: 7ms. 2 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
2022-09-21T14:20:33.383 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:33.382Z INFO cmd/keeper.go:1504 our db requested role is master
2022-09-21T14:20:33.384 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:33.383Z INFO cmd/keeper.go:1542 already master
2022-09-21T14:20:33.405 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:33.405Z INFO cmd/keeper.go:1675 postgres parameters not changed
2022-09-21T14:20:33.406 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:33.405Z INFO cmd/keeper.go:1702 postgres hba entries not changed
2022-09-21T14:20:35.457 app[e5de3968] fra [info] keeper | 2022-09-21 14:20:35.456 UTC [20965] LOG: PID 23527 in cancel request did not match any process
2022-09-21T14:20:36.811 app[e5de3968] fra [info] proxy | [WARNING] 263/142036 (565) : Server bk_db/pg1 is DOWN, reason: Layer4 timeout, check duration: 2001ms. 1 active and 0 backup servers left. 18 sessions active, 0 requeued, 0 remaining in queue.
2022-09-21T14:20:36.814 app[e5de3968] fra [info] keeper | 2022-09-21 14:20:36.813 UTC [20973] LOG: PID 23523 in cancel request did not match any process
2022-09-21T14:20:38.456 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:38.455Z INFO cmd/keeper.go:1504 our db requested role is master
2022-09-21T14:20:38.457 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:38.456Z INFO cmd/keeper.go:1542 already master
2022-09-21T14:20:38.478 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:38.478Z INFO cmd/keeper.go:1675 postgres parameters not changed
2022-09-21T14:20:38.478 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:38.478Z INFO cmd/keeper.go:1702 postgres hba entries not changed
2022-09-21T14:20:42.491 app[5cd3fcb6] fra [info] keeper | 2022-09-21 14:20:41.746 UTC [23542] LOG: could not send data to client: Connection reset by peer
2022-09-21T14:20:42.491 app[5cd3fcb6] fra [info] keeper | 2022-09-21 14:20:41.746 UTC [23542] FATAL: connection to client lost
2022-09-21T14:20:42.491 app[5cd3fcb6] fra [info] keeper | 2022-09-21 14:20:41.749 UTC [23540] LOG: could not send data to client: Connection reset by peer
2022-09-21T14:20:42.491 app[5cd3fcb6] fra [info] keeper | 2022-09-21 14:20:41.749 UTC [23540] FATAL: connection to client lost
2022-09-21T14:20:43.532 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:43.531Z INFO cmd/keeper.go:1504 our db requested role is master
2022-09-21T14:20:43.533 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:43.532Z INFO cmd/keeper.go:1542 already master
2022-09-21T14:20:43.564 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:43.564Z INFO cmd/keeper.go:1675 postgres parameters not changed
2022-09-21T14:20:43.565 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:43.564Z INFO cmd/keeper.go:1702 postgres hba entries not changed
2022-09-21T14:20:45.043 app[e5de3968] fra [info] sentinel | 2022-09-21T14:20:45.042Z INFO cmd/sentinel.go:89 sentinel leadership acquired
2022-09-21T14:20:45.266 app[e5de3968] fra [info] proxy | [WARNING] 263/142045 (565) : Server bk_db/pg1 is UP, reason: Layer7 check passed, code: 200, check duration: 32ms. 2 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
2022-09-21T14:20:48.644 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:48.643Z INFO cmd/keeper.go:1504 our db requested role is master
2022-09-21T14:20:48.645 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:48.644Z INFO cmd/keeper.go:1542 already master
2022-09-21T14:20:48.665 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:48.665Z INFO cmd/keeper.go:1675 postgres parameters not changed
2022-09-21T14:20:48.665 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:48.665Z INFO cmd/keeper.go:1702 postgres hba entries not changed
2022-09-21T14:20:49.942 app[5cd3fcb6] fra [info] sentinel | 2022-09-21T14:20:49.938Z INFO cmd/sentinel.go:82 Trying to acquire sentinels leadership
2022-09-21T14:20:53.720 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:53.719Z INFO cmd/keeper.go:1504 our db requested role is master
2022-09-21T14:20:53.721 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:53.721Z INFO cmd/keeper.go:1542 already master
2022-09-21T14:20:53.750 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:53.750Z INFO cmd/keeper.go:1675 postgres parameters not changed
2022-09-21T14:20:53.751 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:53.750Z INFO cmd/keeper.go:1702 postgres hba entries not changed
2022-09-21T14:20:54.941 app[5cd3fcb6] fra [info] sentinel | 2022-09-21T14:20:54.941Z ERROR cmd/sentinel.go:102 election loop error {"error": "Put \"https://consul-fra.fly-shared.net/v1/session/create?wait=5000ms\": dial tcp: lookup consul-fra.fly-shared.net on [fdaa::3]:53: read udp [fdaa:0:650e:a7b:23c4:0:f809:2]:54944->[fdaa::3]:53: i/o timeout"}
2022-09-21T14:20:54.941 app[5cd3fcb6] fra [info] sentinel | 2022-09-21T14:20:54.941Z ERROR cmd/sentinel.go:1843 error retrieving cluster data {"error": "Get \"https://consul-fra.fly-shared.net/v1/kv/viggianicod-app-stag-db-ejpon17kvmy9dgr4/viggianicod-app-stag-db/clusterdata?consistent=&wait=5000ms\": dial tcp: lookup consul-fra.fly-shared.net on [fdaa::3]:53: read udp [fdaa:0:650e:a7b:23c4:0:f809:2]:54944->[fdaa::3]:53: i/o timeout"}
2022-09-21T14:20:54.942 app[5cd3fcb6] fra [info] keeper | 2022-09-21T14:20:54.942Z ERROR cmd/keeper.go:1041 error retrieving cluster data {"error": "Get \"https://consul-fra.fly-shared.net/v1/kv/viggianicod-app-stag-db-ejpon17kvmy9dgr4/viggianicod-app-stag-db/clusterdata?consistent=&wait=5000ms\": dial tcp: lookup consul-fra.fly-shared.net on [fdaa::3]:53: read udp [fdaa:0:650e:a7b:23c4:0:f809:2]:39566->[fdaa::3]:53: i/o timeout"}
2022-09-21T14:20:54.943 app[5cd3fcb6] fra [info] keeper | 2022-09-21T14:20:54.942Z ERROR cmd/keeper.go:870 failed to update keeper info {"error": "cannot set or renew session for ttl, unable to operate on sessions"}
2022-09-21T14:20:57.245 app[e5de3968] fra [info] sentinel | 2022-09-21T14:20:57.244Z WARN cmd/sentinel.go:276 no keeper info available {"db": "16ffce98", "keeper": "23c40f8092"}
2022-09-21T14:20:57.249 app[e5de3968] fra [info] sentinel | 2022-09-21T14:20:57.249Z INFO cmd/sentinel.go:1151 removing old master db {"db": "16ffce98", "keeper": "23c40f8092"}
2022-09-21T14:20:57.250 app[e5de3968] fra [info] sentinel | 2022-09-21T14:20:57.249Z INFO cmd/sentinel.go:1503 added new standby db {"db": "9ddb26d0", "keeper": "23c40f8092"}
2022-09-21T14:20:58.813 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:58.812Z INFO cmd/keeper.go:1504 our db requested role is master
2022-09-21T14:20:58.814 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:58.813Z INFO cmd/keeper.go:1542 already master
2022-09-21T14:20:58.821 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:58.820Z INFO cmd/keeper.go:1007 creating replication slot {"slot": "stolon_9ddb26d0"}
2022-09-21T14:20:58.851 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:58.849Z INFO cmd/keeper.go:1675 postgres parameters not changed
2022-09-21T14:20:58.851 app[e5de3968] fra [info] keeper | 2022-09-21T14:20:58.850Z INFO cmd/keeper.go:1702 postgres hba entries not changed
2022-09-21T14:21:02.937 app[e5de3968] fra [info] sentinel | 2022-09-21T14:21:02.936Z WARN cmd/sentinel.go:276 no keeper info available {"db": "9ddb26d0", "keeper": "23c40f8092"}
2022-09-21T14:21:04.009 app[e5de3968] fra [info] keeper | 2022-09-21T14:21:04.007Z INFO cmd/keeper.go:1504 our db requested role is master
2022-09-21T14:21:04.021 app[e5de3968] fra [info] keeper | 2022-09-21T14:21:04.020Z INFO cmd/keeper.go:1542 already master
2022-09-21T14:21:04.144 app[e5de3968] fra [info] keeper | 2022-09-21T14:21:04.143Z INFO cmd/keeper.go:1675 postgres parameters not changed
2022-09-21T14:21:04.149 app[e5de3968] fra [info] keeper | 2022-09-21T14:21:04.143Z INFO cmd/keeper.go:1702 postgres hba entries not changed
2022-09-21T14:21:04.943 app[5cd3fcb6] fra [info] sentinel | 2022-09-21T14:21:04.942Z INFO cmd/sentinel.go:82 Trying to acquire sentinels leadership
2022-09-21T14:21:06.655 app[5cd3fcb6] fra [info] proxy | [WARNING] 263/142106 (565) : Server bk_db/pg1 ('fra.viggianicod-app-stag-db.internal') is UP/READY (resolves again).
2022-09-21T14:21:06.925 app[5cd3fcb6] fra [info] sentinel | 2022-09-21T14:21:06.925Z ERROR cmd/sentinel.go:102 election loop error {"error": "Put \"https://consul-fra.fly-shared.net/v1/session/create?wait=5000ms\": EOF"}
2022-09-21T14:21:09.958 app[5cd3fcb6] fra [info] proxy | [WARNING] 263/142109 (565) : Server bk_db/pg1 is DOWN, reason: Layer4 timeout, check duration: 2001ms. 1 active and 0 backup servers left. 4 sessions active, 0 requeued, 0 remaining in queue.
2022-09-21T14:21:11.934 app[5cd3fcb6] fra [info] sentinel | 2022-09-21T14:21:11.933Z ERROR cmd/sentinel.go:1843 error retrieving cluster data {"error": "Get \"https://consul-fra.fly-shared.net/v1/kv/viggianicod-app-stag-db-ejpon17kvmy9dgr4/viggianicod-app-stag-db/clusterdata?consistent=&wait=5000ms\": read tcp [2604:1380:4091:3601::f809:3]:40818->[2a09:8280:1::3:c1f]:443: read: connection reset by peer"}
2022-09-21T14:21:11.949 app[5cd3fcb6] fra [info] keeper | 2022-09-21T14:21:11.936Z INFO cmd/keeper.go:1140 current db UID different than cluster data db UID {"db": "16ffce98", "cdDB": "9ddb26d0"}
2022-09-21T14:21:11.959 app[5cd3fcb6] fra [info] keeper | 2022-09-21T14:21:11.958Z INFO cmd/keeper.go:1295 resyncing the database cluster
2022-09-21T14:21:12.006 app[5cd3fcb6] fra [info] keeper | 2022-09-21 14:21:12.004 UTC [23636] FATAL: canceling authentication due to timeout
2022-09-21T14:21:12.009 app[5cd3fcb6] fra [info] keeper | 2022-09-21 14:21:12.009 UTC [23629] FATAL: terminating connection due to administrator command
2022-09-21T14:21:12.027 app[5cd3fcb6] fra [info] keeper | 2022-09-21 14:21:12.025 UTC [32039] FATAL: terminating connection due to administrator command
2022-09-21T14:21:12.037 app[5cd3fcb6] fra [info] keeper | 2022-09-21 14:21:12.036 UTC [32018] LOG: background worker "logical replication launcher" (PID 2060) exited with exit code 1
2022-09-21T14:21:12.078 app[5cd3fcb6] fra [info] keeper | 2022-09-21 14:21:12.078 UTC [32018] LOG: database system is shut down
2022-09-21T14:21:12.803 app[5cd3fcb6] fra [info] keeper | 2022-09-21T14:21:12.803Z INFO cmd/keeper.go:924 syncing from followed db {"followedDB": "a1fb9996", "keeper": "23c30f8082"}
2022-09-21T14:21:12.803 app[5cd3fcb6] fra [info] keeper | 2022-09-21T14:21:12.803Z INFO postgresql/postgresql.go:964 running pg_basebackup
2022-09-21T14:21:12.888 app[e5de3968] fra [info] keeper | 2022-09-21 14:21:12.880 UTC [21174] LOG: PID 23583 in cancel request did not match any process
2022-09-21T14:21:12.953 app[e5de3968] fra [info] keeper | 2022-09-21T14:21:12.940Z INFO cmd/keeper.go:1504 our db requested role is master
2022-09-21T14:21:12.953 app[e5de3968] fra [info] keeper | 2022-09-21T14:21:12.941Z INFO cmd/keeper.go:1542 already master
2022-09-21T14:21:12.983 app[e5de3968] fra [info] keeper | 2022-09-21T14:21:12.968Z INFO cmd/keeper.go:1675 postgres parameters not changed
2022-09-21T14:21:12.983 app[e5de3968] fra [info] keeper | 2022-09-21T14:21:12.968Z INFO cmd/keeper.go:1702 postgres hba entries not changed
2022-09-21T14:21:13.084 app[5cd3fcb6] fra [info] keeper | pg_basebackup: initiating base backup, waiting for checkpoint to complete
2022-09-21T14:21:13.128 app[e5de3968] fra [info] sentinel | 2022-09-21T14:21:13.128Z WARN cmd/sentinel.go:287 received db state for unexpected db uid {"receivedDB": "16ffce98", "db": "9ddb26d0", "keeper": "23c40f8092"}
2022-09-21T14:21:13.413 app[e5de3968] fra [info] keeper | 2022-09-21 14:21:13.411 UTC [21189] LOG: could not send data to client: Connection reset by peer
2022-09-21T14:21:13.413 app[e5de3968] fra [info] keeper | 2022-09-21 14:21:13.411 UTC [21189] FATAL: connection to client lost
2022-09-21T14:21:13.556 app[e5de3968] fra [info] keeper | 2022-09-21 14:21:13.554 UTC [21190] LOG: could not send data to client: Connection reset by peer
2022-09-21T14:21:13.556 app[e5de3968] fra [info] keeper | 2022-09-21 14:21:13.554 UTC [21190] FATAL: connection to client lost
2022-09-21T14:21:13.819 app[e5de3968] fra [info] keeper | 2022-09-21 14:21:13.818 UTC [21196] LOG: could not send data to client: Connection reset by peer
2022-09-21T14:21:13.819 app[e5de3968] fra [info] keeper | 2022-09-21 14:21:13.818 UTC [21196] FATAL: connection to client lost
2022-09-21T14:21:13.826 app[e5de3968] fra [info] keeper | 2022-09-21 14:21:13.824 UTC [21197] LOG: could not send data to client: Connection reset by peer
2022-09-21T14:21:13.826 app[e5de3968] fra [info] keeper | 2022-09-21 14:21:13.824 UTC [21197] FATAL: connection to client lost
2022-09-21T14:21:13.993 app[5cd3fcb6] fra [info] exporter | ERRO[941926] Error opening connection to database (postgresql://flypgadmin:PASSWORD_REMOVED@[fdaa:0:650e:a7b:23c4:0:f809:2]:5433/postgres?sslmode=disable): driver: bad connection source="postgres_exporter.go:1658"
2022-09-21T14:21:17.009 app[5cd3fcb6] fra [info] keeper | 53481/7998581 kB (0%), 0/1 tablespace (/data/postgres/base/16386/25123 )
2022-09-21T14:21:18.003 app[5cd3fcb6] fra [info] keeper | 88777/7998581 kB (1%), 0/1 tablespace (/data/postgres/base/16386/25123 )
2022-09-21T14:21:18.443 app[e5de3968] fra [info] keeper | 2022-09-21T14:21:18.437Z INFO cmd/keeper.go:1504 our db requested role is master
2022-09-21T14:21:18.443 app[e5de3968] fra [info] keeper | 2022-09-21T14:21:18.438Z INFO cmd/keeper.go:1542 already master
2022-09-21T14:21:18.463 app[e5de3968] fra [info] proxy | [WARNING] 263/142118 (565) : Server bk_db/pg1 is DOWN, reason: Layer7 timeout, check duration: 5002ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2022-09-21T14:21:18.517 app[e5de3968] fra [info] sentinel | 2022-09-21T14:21:18.497Z WARN cmd/sentinel.go:276 no keeper info available {"db": "a1fb9996", "keeper": "23c30f8082"}
2022-09-21T14:21:18.605 app[e5de3968] fra [info] keeper | 2022-09-21T14:21:18.603Z INFO cmd/keeper.go:1675 postgres parameters not changed
2022-09-21T14:21:18.605 app[e5de3968] fra [info] keeper | 2022-09-21T14:21:18.603Z INFO cmd/keeper.go:1702 postgres hba entries not changed
2022-09-21T14:21:19.011 app[5cd3fcb6] fra [info] keeper | 128585/7998581 kB (1%), 0/1 tablespace (/data/postgres/base/16386/25123 )
2022-09-21T14:21:20.013 app[5cd3fcb6] fra [info] keeper | 227401/7998581 kB (2%), 0/1 tablespace (/data/postgres/base/16386/25123 )
2022-09-21T14:21:21.007 app[5cd3fcb6] fra [info] keeper | 284233/7998581 kB (3%), 0/1 tablespace (/data/postgres/base/16386/25123 )
2022-09-21T14:21:21.964 app[5cd3fcb6] fra [info] sentinel | 2022-09-21T14:21:21.961Z ERROR cmd/sentinel.go:1843 error retrieving cluster data {"error": "Get \"https://consul-fra.fly-shared.net/v1/kv/viggianicod-app-stag-db-ejpon17kvmy9dgr4/viggianicod-app-stag-db/clusterdata?consistent=&wait=5000ms\": EOF"}
2022-09-21T14:21:22.019 app[5cd3fcb6] fra [info] keeper | 323145/7998581 kB (4%), 0/1 tablespace (/data/postgres/base/16386/25123 )
2022-09-21T14:21:23.013 app[5cd3fcb6] fra [info] keeper | 360937/7998581 kB (4%), 0/1 tablespace (/data/postgres/base/16386/25123 )
2022-09-21T14:21:24.010 app[5cd3fcb6] fra [info] keeper | 393193/7998581 kB (4%), 0/1 tablespace (/data/postgres/base/16386/25123 )
2022-09-21T14:21:24.776 app[e5de3968] fra [info] keeper | 2022-09-21T14:21:24.776Z INFO cmd/keeper.go:1504 our db requested role is master
2022-09-21T14:21:24.777 app[e5de3968] fra [info] keeper | 2022-09-21T14:21:24.777Z INFO cmd/keeper.go:1542 already master
2022-09-21T14:21:24.805 app[e5de3968] fra [info] keeper | 2022-09-21T14:21:24.805Z INFO cmd/keeper.go:1675 postgres parameters not changed
2022-09-21T14:21:24.806 app[e5de3968] fra [info] keeper | 2022-09-21T14:21:24.805Z INFO cmd/keeper.go:1702 postgres hba entries not changed
2022-09-21T14:21:25.007 app[5cd3fcb6] fra [info] keeper | 418377/7998581 kB (5%), 0/1 tablespace (/data/postgres/base/16386/25123 )

And these are the errors reporteds by Appsignal:

** (FunctionClauseError) no function clause matching in Postgrex.Protocol.msg_decode/1


Backtrace:
lib/postgrex/protocol.ex:3130 Postgrex.Protocol.msg_decode/1	
lib/postgrex/protocol.ex:3101 Postgrex.Protocol.msg_recv/3	
lib/postgrex/protocol.ex:1071 Postgrex.Protocol.recv_simple/6	
lib/postgrex/simple_connection.ex:376 Postgrex.SimpleConnection.handle/5	
lib/connection.ex:488 Connection.handle_call/3	
gen_server.erl:721 :gen_server.try_handle_call/4	
gen_server.erl:750 :gen_server.handle_msg/6	
proc_lib.erl:226 :proc_lib.init_p_do_apply/3

** (ErlangError) Erlang error: {{%DBConnection.ConnectionError{message: "connection not available and request was dropped from queue after 1980ms. This means requests are coming in and your connection pool cannot serve them fast enough. You can address this by:\n\n  1. Ensuring your database is available and that you can connect to it\n  2. Tracking down slow queries and making sure they are running fast enough\n  3. Increasing the pool_size (although this increases resource consumption)\n  4. Allowing requests to wait longer by increasing :queue_target and :queue_interval\n\nSee DBConnection.start_link/2 for more information\n", reason: :queue_timeout, severity: :error}, [{DBConnection, :transaction, 3, [file: 'lib/db_connection.ex', line: 904, error_info: %{module: Exception}]}, {Oban.Peers.Postgres, :handle_info, 2, [file: 'lib/oban/peers/postgres.ex', line: 85]}, {:gen_server, :try_dispatch, 4, [file: 'gen_server.erl', line: 695]}, {:gen_server, :loop, 7, [file: 'gen_server.erl', line: 437]}, {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 226]}]}, {GenServer, :call, [#PID<0.442.57>, :leader?, 5000]}}

Backtrace:
lib/gen_server.ex:1030 GenServer.call/3	
lib/oban/peer.ex:90 Oban.Peer.leader?/2	
lib/oban/plugins/stager.ex:119 Oban.Plugins.Stager.check_leadership_and_stage/1	
lib/oban/plugins/stager.ex:106 anonymous fn/2 in Oban.Plugins.Stager.handle_info/2	
/app/deps/telemetry/src/telemetry.erl:320 :telemetry.span/3	
lib/oban/plugins/stager.ex:105 Oban.Plugins.Stager.handle_info/2	
gen_server.erl:695 :gen_server.try_dispatch/4	
gen_server.erl:771 :gen_server.handle_msg/6

** (Postgrex.Error) FATAL 57P01 (admin_shutdown) terminating connection due to administrator command


Backtrace:
lib/db_connection.ex:904 DBConnection.transaction/3	
lib/oban/plugins/stager.ex:106 anonymous fn/2 in Oban.Plugins.Stager.handle_info/2	
/app/deps/telemetry/src/telemetry.erl:320 :telemetry.span/3	
lib/oban/plugins/stager.ex:105 Oban.Plugins.Stager.handle_info/2	
gen_server.erl:695 :gen_server.try_dispatch/4	
gen_server.erl:771 :gen_server.handle_msg/6	
proc_lib.erl:226 :proc_lib.init_p_do_apply/3

** (DBConnection.ConnectionError) connection not available and request was dropped from queue after 2788ms. This means requests are coming in and your connection pool cannot serve them fast enough. You can address this by:

  1. Ensuring your database is available and that you can connect to it
  2. Tracking down slow queries and making sure they are running fast enough
  3. Increasing the pool_size (although this increases resource consumption)
  4. Allowing requests to wait longer by increasing :queue_target and :queue_interval

See DBConnection.start_link/2 for more information


Backtrace:
lib/db_connection.ex:904 DBConnection.transaction/3	
lib/oban/peers/postgres.ex:85 Oban.Peers.Postgres.handle_info/2	
gen_server.erl:695 :gen_server.try_dispatch/4	
gen_server.erl:771 :gen_server.handle_msg/6	
proc_lib.erl:226 :proc_lib.init_p_do_apply/3

** (DBConnection.ConnectionError) connection not available and request was dropped from queue after 259ms. This means requests are coming in and your connection pool cannot serve them fast enough. You can address this by:

  1. Ensuring your database is available and that you can connect to it
  2. Tracking down slow queries and making sure they are running fast enough
  3. Increasing the pool_size (although this increases resource consumption)
  4. Allowing requests to wait longer by increasing :queue_target and :queue_interval

See DBConnection.start_link/2 for more information


Backtrace:
lib/ecto/adapters/sql.ex:932 Ecto.Adapters.SQL.raise_sql_call_error/1	
lib/ecto/adapters/sql.ex:847 Ecto.Adapters.SQL.execute/6	
lib/ecto/repo/queryable.ex:221 Ecto.Repo.Queryable.execute/4	
lib/ecto/repo/queryable.ex:19 Ecto.Repo.Queryable.all/3	
lib/ecto/repo/queryable.ex:147 Ecto.Repo.Queryable.one/3	
lib/contexts/users/users.ex:563 Viggianicod.Users.get_session_by_token/1	
lib/application/plugs.ex:41 Viggianicod.Plugs.fetch_user_session/2	
(viggianicod 2.4.0) Viggianicod.Router.authentication/2

Some of these apps are already in production and we’re planning to release soon this one, so we will need to understand what is happening and how to fix it.

Can somebody help us?

Maybe related to this ?

ykd · October 11, 2022, 11:38pm

Hi,

I’m also getting driver: bad connection source="postgres_exporter.go:1658" error on a new Phoenix app.

Here is the full log

Waiting for logs...
2022-10-11T23:35:13.455 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:12.845 UTC [8424] FATAL: terminating connection due to unexpected postmaster exit
2022-10-11T23:35:13.455 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:12.825 UTC [28937] FATAL: terminating connection due to unexpected postmaster exit
2022-10-11T23:35:13.455 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:12.825 UTC [28940] FATAL: terminating connection due to unexpected postmaster exit
2022-10-11T23:35:13.455 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:12.845 UTC [8421] FATAL: terminating connection due to unexpected postmaster exit
2022-10-11T23:35:13.455 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:12.825 UTC [8415] FATAL: terminating connection due to unexpected postmaster exit
2022-10-11T23:35:13.455 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:12.824 UTC [8417] FATAL: terminating connection due to unexpected postmaster exit
2022-10-11T23:35:13.455 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:13.033 UTC [8851] FATAL: terminating connection due to unexpected postmaster exit
2022-10-11T23:35:13.455 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:13.033 UTC [8850] FATAL: terminating connection due to unexpected postmaster exit
2022-10-11T23:35:13.455 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:13.265 UTC [8853] FATAL: terminating connection due to unexpected postmaster exit
2022-10-11T23:35:13.903 app[a7ce8936] hkg [info] [ 7106.894411] Out of memory: Killed process 28929 (postgres) total-vm:157672kB, anon-rss:3180kB, file-rss:0kB, shmem-rss:6888kB, UID:1000 pgtables:168kB oom_score_adj:0
2022-10-11T23:35:14.433 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:13.384 UTC [8854] FATAL: terminating connection due to unexpected postmaster exit
2022-10-11T23:35:14.433 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:13.571 UTC [8852] FATAL: terminating connection due to unexpected postmaster exit
2022-10-11T23:35:14.433 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.303 UTC [8857] FATAL: terminating connection due to unexpected postmaster exit
2022-10-11T23:35:14.433 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.343 UTC [8849] FATAL: terminating connection due to unexpected postmaster exit
2022-10-11T23:35:14.433 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.339 UTC [8848] FATAL: terminating connection due to unexpected postmaster exit
2022-10-11T23:35:14.433 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.356 UTC [8856] FATAL: terminating connection due to unexpected postmaster exit
2022-10-11T23:35:14.433 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.359 UTC [8855] FATAL: terminating connection due to unexpected postmaster exit
2022-10-11T23:35:14.433 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.425 UTC [8847] FATAL: terminating connection due to unexpected postmaster exit
2022-10-11T23:35:14.438 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.437 UTC [8831] FATAL: terminating connection due to unexpected postmaster exit
2022-10-11T23:35:14.532 app[a7ce8936] hkg [info] keeper | 2022-10-11T23:35:14.530Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: connection refused"}
2022-10-11T23:35:14.652 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.652 UTC [8864] LOG: starting PostgreSQL 14.4 (Debian 14.4-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2022-10-11T23:35:14.653 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.652 UTC [8864] LOG: listening on IPv6 address "fdaa:0:b8b9:a7b:7f08:2:4b84:2", port 5433
2022-10-11T23:35:14.653 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.653 UTC [8864] LOG: listening on Unix socket "/tmp/.s.PGSQL.5433"
2022-10-11T23:35:14.657 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.656 UTC [8865] LOG: database system was interrupted; last known up at 2022-10-11 23:33:11 UTC
2022-10-11T23:35:14.679 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.678 UTC [8866] FATAL: the database system is starting up
2022-10-11T23:35:14.699 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.696 UTC [8867] LOG: PID 28941 in cancel request did not match any process
2022-10-11T23:35:14.700 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.699 UTC [8868] LOG: PID 28931 in cancel request did not match any process
2022-10-11T23:35:14.701 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.700 UTC [8869] LOG: PID 28918 in cancel request did not match any process
2022-10-11T23:35:14.708 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.706 UTC [8873] LOG: PID 28913 in cancel request did not match any process
2022-10-11T23:35:14.711 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.710 UTC [8878] LOG: PID 28917 in cancel request did not match any process
2022-10-11T23:35:14.712 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.710 UTC [8877] LOG: PID 28921 in cancel request did not match any process
2022-10-11T23:35:14.713 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.711 UTC [8876] LOG: PID 28935 in cancel request did not match any process
2022-10-11T23:35:14.715 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.713 UTC [8871] LOG: PID 28915 in cancel request did not match any process
2022-10-11T23:35:14.717 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.715 UTC [8879] LOG: PID 28938 in cancel request did not match any process
2022-10-11T23:35:14.718 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.716 UTC [8872] LOG: PID 28924 in cancel request did not match any process
2022-10-11T23:35:14.720 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.718 UTC [8875] LOG: PID 28933 in cancel request did not match any process
2022-10-11T23:35:14.723 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.722 UTC [8880] LOG: PID 28937 in cancel request did not match any process
2022-10-11T23:35:14.729 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.728 UTC [8882] LOG: PID 28914 in cancel request did not match any process
2022-10-11T23:35:14.730 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.729 UTC [8883] LOG: PID 28936 in cancel request did not match any process
2022-10-11T23:35:14.733 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.732 UTC [8885] LOG: PID 28932 in cancel request did not match any process
2022-10-11T23:35:14.734 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.733 UTC [8881] LOG: PID 28940 in cancel request did not match any process
2022-10-11T23:35:14.736 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.735 UTC [8886] LOG: PID 28919 in cancel request did not match any process
2022-10-11T23:35:14.738 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.737 UTC [8884] LOG: PID 28934 in cancel request did not match any process
2022-10-11T23:35:14.743 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.742 UTC [8887] LOG: PID 28912 in cancel request did not match any process
2022-10-11T23:35:14.747 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.746 UTC [8888] LOG: PID 28939 in cancel request did not match any process
2022-10-11T23:35:14.749 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.748 UTC [8889] LOG: PID 28922 in cancel request did not match any process
2022-10-11T23:35:14.754 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.752 UTC [8890] LOG: PID 28926 in cancel request did not match any process
2022-10-11T23:35:14.759 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.758 UTC [8865] LOG: database system was not properly shut down; automatic recovery in progress
2022-10-11T23:35:14.761 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.760 UTC [8865] LOG: redo starts at 0/29000028
2022-10-11T23:35:14.762 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.761 UTC [8865] LOG: redo done at 0/2A000170 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2022-10-11T23:35:14.763 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.762 UTC [8894] LOG: PID 28929 in cancel request did not match any process
2022-10-11T23:35:14.764 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.763 UTC [8893] LOG: PID 28920 in cancel request did not match any process
2022-10-11T23:35:14.766 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.764 UTC [8891] LOG: PID 28923 in cancel request did not match any process
2022-10-11T23:35:14.768 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.767 UTC [8892] LOG: PID 28925 in cancel request did not match any process
2022-10-11T23:35:14.772 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.770 UTC [8895] LOG: PID 28916 in cancel request did not match any process
2022-10-11T23:35:14.774 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.772 UTC [8899] FATAL: the database system is starting up
2022-10-11T23:35:14.775 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.773 UTC [8898] FATAL: the database system is starting up
2022-10-11T23:35:14.777 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.775 UTC [8896] LOG: PID 28930 in cancel request did not match any process
2022-10-11T23:35:14.778 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.776 UTC [8897] FATAL: the database system is starting up
2022-10-11T23:35:14.780 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.778 UTC [8900] FATAL: the database system is starting up
2022-10-11T23:35:14.780 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:14.780 UTC [8864] LOG: database system is ready to accept connections
2022-10-11T23:35:15.356 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:15.354 UTC [8957] LOG: PID 8420 in cancel request did not match any process
2022-10-11T23:35:15.358 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:15.356 UTC [8959] LOG: PID 8430 in cancel request did not match any process
2022-10-11T23:35:15.359 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:15.358 UTC [8956] LOG: PID 8417 in cancel request did not match any process
2022-10-11T23:35:15.361 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:15.359 UTC [8958] LOG: PID 8421 in cancel request did not match any process
2022-10-11T23:35:15.362 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:15.360 UTC [8955] LOG: PID 8433 in cancel request did not match any process
2022-10-11T23:35:15.363 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:15.362 UTC [8960] LOG: PID 8432 in cancel request did not match any process
2022-10-11T23:35:15.364 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:15.363 UTC [8961] LOG: PID 8427 in cancel request did not match any process
2022-10-11T23:35:15.365 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:15.364 UTC [8954] LOG: PID 8422 in cancel request did not match any process
2022-10-11T23:35:15.366 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:15.365 UTC [8962] LOG: PID 8426 in cancel request did not match any process
2022-10-11T23:35:15.367 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:15.366 UTC [8963] LOG: PID 8428 in cancel request did not match any process
2022-10-11T23:35:15.369 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:15.367 UTC [8953] LOG: PID 8419 in cancel request did not match any process
2022-10-11T23:35:15.370 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:15.368 UTC [8964] LOG: PID 8425 in cancel request did not match any process
2022-10-11T23:35:15.371 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:15.369 UTC [8965] LOG: PID 8429 in cancel request did not match any process
2022-10-11T23:35:15.373 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:15.372 UTC [8967] LOG: PID 8442 in cancel request did not match any process
2022-10-11T23:35:15.374 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:15.373 UTC [8966] LOG: PID 8431 in cancel request did not match any process
2022-10-11T23:35:15.376 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:15.374 UTC [8968] LOG: PID 8434 in cancel request did not match any process
2022-10-11T23:35:15.378 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:15.376 UTC [8969] LOG: PID 8440 in cancel request did not match any process
2022-10-11T23:35:15.384 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:15.382 UTC [8970] LOG: PID 8443 in cancel request did not match any process
2022-10-11T23:35:15.385 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:15.384 UTC [8971] LOG: PID 8441 in cancel request did not match any process
2022-10-11T23:35:15.386 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:15.385 UTC [8972] LOG: PID 8436 in cancel request did not match any process
2022-10-11T23:35:15.387 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:15.386 UTC [8974] LOG: PID 8437 in cancel request did not match any process
2022-10-11T23:35:15.388 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:15.387 UTC [8973] LOG: PID 8435 in cancel request did not match any process
2022-10-11T23:35:15.390 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:15.388 UTC [8978] LOG: PID 8415 in cancel request did not match any process
2022-10-11T23:35:15.391 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:15.389 UTC [8979] LOG: PID 8423 in cancel request did not match any process
2022-10-11T23:35:15.392 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:15.390 UTC [8975] LOG: PID 8439 in cancel request did not match any process
2022-10-11T23:35:15.394 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:15.392 UTC [8980] LOG: PID 8418 in cancel request did not match any process
2022-10-11T23:35:15.396 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:15.393 UTC [8977] LOG: PID 8414 in cancel request did not match any process
2022-10-11T23:35:15.397 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:15.395 UTC [8981] LOG: PID 8416 in cancel request did not match any process
2022-10-11T23:35:15.398 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:15.397 UTC [8976] LOG: PID 8438 in cancel request did not match any process
2022-10-11T23:35:15.400 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:15.399 UTC [8982] LOG: PID 8424 in cancel request did not match any process
2022-10-11T23:35:15.402 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:15.401 UTC [8984] LOG: PID 28927 in cancel request did not match any process
2022-10-11T23:35:15.466 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:15.465 UTC [9020] LOG: PID 8831 in cancel request did not match any process
2022-10-11T23:35:15.469 app[a7ce8936] hkg [info] keeper | 2022-10-11 23:35:15.467 UTC [9021] LOG: PID 28928 in cancel request did not match any process
2022-10-11T23:35:18.200 app[a7ce8936] hkg [info] exporter | ERRO[7110] Error opening connection to database (postgresql://flypgadmin:PASSWORD_REMOVED@[fdaa:0:b8b9:a7b:7f08:2:4b84:2]:5433/postgres?sslmode=disable): driver: bad connection source="postgres_exporter.go:1658"
2022-10-11T23:35:32.185 app[a7ce8936] hkg [info] exporter | INFO[7124] Established new database connection to "fdaa:0:b8b9:a7b:7f08:2:4b84:2:5433". source="postgres_exporter.go:970"
2022-10-11T23:35:32.194 app[a7ce8936] hkg [info] exporter | INFO[7124] Semantic Version Changed on "fdaa:0:b8b9:a7b:7f08:2:4b84:2:5433": 0.0.0 -> 14.4.0 source="postgres_exporter.go:1539"
2022-10-11T23:35:33.238 app[a7ce8936] hkg [info] exporter | ERRO[7125] Error opening connection to database (postgresql://flypgadmin:PASSWORD_REMOVED@[fdaa:0:b8b9:a7b:7f08:2:4b84:2]:5433/sportlotto_landing?sslmode=disable): driver: bad connection source="postgres_exporter.go:1608"
2022-10-11T23:35:47.219 app[a7ce8936] hkg [info] exporter | INFO[7139] Established new database connection to "fdaa:0:b8b9:a7b:7f08:2:4b84:2:5433". source="postgres_exporter.go:970"
2022-10-11T23:35:47.230 app[a7ce8936] hkg [info] exporter | INFO[7139] Semantic Version Changed on "fdaa:0:b8b9:a7b:7f08:2:4b84:2:5433": 0.0.0 -> 14.4.0 source="postgres_exporter.go:1539"

shaun · October 12, 2022, 1:53am

Looks like you’re hitting an OOM.

2022-10-11T23:35:13.903 app[a7ce8936] hkg [info] [ 7106.894411] Out of memory: Killed process 28929 (postgres) total-vm:157672kB, anon-rss:3180kB, file-rss:0kB, shmem-rss:6888kB, UID:1000 pgtables:168kB oom_score_adj:0

Here’s some docs on how to scale up your instance: Scaling and Autoscaling · Fly Docs

shaun · October 12, 2022, 2:01am

@moissela I’m sorry i’m just seeing this… Are you still having issues on your end?

ykd · October 12, 2022, 7:41am

Right, classical. I shouldn’t have been doing that so late

But that database contains 3 tables and literally no data. I wonder if OOM is a right thing to happen in this case

moissela · October 13, 2022, 10:35am

Hi Shaun, thanks for reply.

Yes, we continue to have this strange issues in an unpredictable way on various apps: for example the last one was on 11th Oct night.

shaun · October 13, 2022, 12:37pm

Are all of your clusters in fra?

moissela · October 13, 2022, 12:57pm

Currently yes: the apps we are testing on fly.io are in beta and used only by users from Italy, so we haven’t the need to use multiple regions yet.

Could this make our apps vulnerable to single region temporary issues and therefore be the cause of our problems?

Topic		Replies	Views
Postgresql DB down and restarts don't help Questions / Help	4	1034	June 1, 2022
Possible issue with database	27	3374	March 2, 2022
Phoenix app with deployed Postgres DB OOMs with no writes / reads Phoenix elixir , postgres	2	290	March 11, 2024
Postgres: no eligible masters Questions / Help postgres	3	789	July 5, 2022
Postgres server down postgres	1	320	January 10, 2023

Postgres clusters periodically down across many of our organizations

Related topics