Postgresql DB down and restarts don't help

No changes in my app for weeks. Suddenly I noticed “Internal server error” in my app for all pages. Seems phoenix says DB is down so I go and check the logs for the postgresql db (just one instance and no replicas) and it keeps repeating this:

2022-06-01T17:21:00.981 app[34d538e2] ams [info] keeper | 2022-06-01T17:21:00.981Z INFO cmd/keeper.go:1505 our db requested role is master

2022-06-01T17:21:00.982 app[34d538e2] ams [info] keeper | 2022-06-01T17:21:00.982Z INFO cmd/keeper.go:1543 already master

2022-06-01T17:21:01.006 app[34d538e2] ams [info] keeper | 2022-06-01T17:21:01.006Z INFO cmd/keeper.go:1676 postgres parameters not changed

In a loop forever every 4 seconds. I tried to restart the VM with “flyctl restart” but when it started up it just went back to those messages forever again. I assume there has been some update of the db:s etc from your side recently? Since I haven’t touched anything for weeks. I found this thread (Postgres instance count increase fails) which might be similar issue (but which went unanswered).

What do your Phoenix logs say? Those Postgres logs are normal (it’s actually saying Postgres is the master and in a good state).

Aha, ok! Good to know that at least it isn’t the db that is malfunctioning.

This is the logs from phoenix (i restarted that app as well):

2022-06-01T17:13:49.244 app[51437a33] ams [info] 17:13:49.243 [error] GenServer {Oban.Registry, {Oban, Oban.Peer}} terminating

2022-06-01T17:13:49.244 app[51437a33] ams [info] ** (DBConnection.ConnectionError) connection not available and request was dropped from queue after 1658ms. This means requests are coming in and your connection pool cannot serve them fast enough. You can address this by:

2022-06-01T17:13:49.244 app[51437a33] ams [info] 1. Ensuring your database is available and that you can connect to it

2022-06-01T17:13:49.244 app[51437a33] ams [info] 2. Tracking down slow queries and making sure they are running fast enough

2022-06-01T17:13:49.244 app[51437a33] ams [info] 3. Increasing the pool_size (although this increases resource consumption)

2022-06-01T17:13:49.244 app[51437a33] ams [info] 4. Allowing requests to wait longer by increasing :queue_target and :queue_interval

2022-06-01T17:13:49.244 app[51437a33] ams [info] See DBConnection.start_link/2 for more information

2022-06-01T17:13:49.244 app[51437a33] ams [info] (db_connection 2.4.2) lib/db_connection.ex:904: DBConnection.transaction/3

2022-06-01T17:13:49.244 app[51437a33] ams [info] (oban 2.11.3) lib/oban/peer.ex:147: Oban.Peer.handle_info/2

2022-06-01T17:13:49.244 app[51437a33] ams [info] (stdlib 3.16.1) gen_server.erl:695: :gen_server.try_dispatch/4

2022-06-01T17:13:49.244 app[51437a33] ams [info] (stdlib 3.16.1) gen_server.erl:437: :gen_server.loop/7

2022-06-01T17:13:49.244 app[51437a33] ams [info] (stdlib 3.16.1) proc_lib.erl:226: :proc_lib.init_p_do_apply/3

2022-06-01T17:13:49.244 app[51437a33] ams [info] Last message: {:continue, :start}

2022-06-01T17:13:50.372 app[51437a33] ams [info] 17:13:50.372 [error] Postgrex.Protocol (#PID<0.2883.0>) failed to connect: ** (DBConnection.ConnectionError) tcp recv (idle): closed

In case someone lands here in the future. This seems to have helped:

Now it works again.

I had similar issues with postgres today also in ams region

1 Like