Postgres server down

ramnivas · January 10, 2023, 3:37am

I have a pg app that seems to be down for since perhaps an hour. When I try looking into the monitoring tab, I see the following error:

2023-01-09T22:52:43.511 app[4d896d2f291287] ord [info] sentinel | 2023-01-09T22:52:43.511Z WARN cmd/sentinel.go:276 no keeper info available {"db": "c82203b5", "keeper": "9adae6573ca02"}

2023-01-09T22:52:54.932 app[4d896d2f291287] ord [info] keeper | 2023-01-09T22:52:54.930Z ERROR cmd/keeper.go:1041 error retrieving cluster data {"error": "Unexpected response code: 500"}

2023-01-09T23:00:12.672 app[4d896d2f291287] ord [info] sentinel | 2023-01-09T23:00:12.671Z WARN cmd/sentinel.go:276 no keeper info available {"db": "c82203b5", "keeper": "9adae6573ca02"}

2023-01-10T02:58:16.042 app[4d896d2f291287] ord [info] keeper | 2023-01-10 02:58:16.038 UTC [2118] LOG: PID 25459 in cancel request did not match any process

2023-01-10T03:04:29.086 app[4d896d2f291287] ord [info] keeper | 2023-01-10 03:04:29.082 UTC [4221] LOG: PID 2590 in cancel request did not match any process

2023-01-10T03:04:49.330 app[4d896d2f291287] ord [info] keeper | 2023-01-10 03:04:49.324 UTC [4338] LOG: PID 2953 in cancel request did not match any process

2023-01-10T03:09:29.701 app[4d896d2f291287] ord [info] keeper | 2023-01-10 03:09:29.697 UTC [5909] LOG: PID 2156 in cancel request did not match any process

2023-01-10T03:09:46.979 app[4d896d2f291287] ord [info] keeper | 2023-01-10 03:09:46.976 UTC [6006] LOG: PID 3216 in cancel request did not match any process

2023-01-10T03:15:30.718 app[4d896d2f291287] ord [info] keeper | 2023-01-10 03:15:30.714 UTC [7932] LOG: PID 3538 in cancel request did not match any process

2023-01-10T03:15:49.182 app[4d896d2f291287] ord [info] keeper | 2023-01-10 03:15:49.177 UTC [8043] LOG: PID 3405 in cancel request did not match any process

I am trying to restart the pg app, but that too seems to have been stuck and eventually fails:

fly pg restart -a <app-name>
Identifying cluster role(s)
  Machine 4d896d2f291287: leader
Restarting machine 4d896d2f291287
Error could not stop machine 4d896d2f291287: failed to restart VM 4d896d2f291287: Post "http://[fdaa:1:459::3]:4280/v1/apps/<app-name>/machines/4d896d2f291287/restart?force_stop=false": EOF

ramnivas · January 10, 2023, 3:52am

An update: After restarting my server app (that uses this database), the connectivity between the app and the database seems restored, but all queries are super slow.

Topic		Replies	Views
Postgres down. Again Questions / Help postgres	2	398	February 22, 2023
Postgres clusters periodically down across many of our organizations Questions / Help postgres	7	1646	October 13, 2022
Postgres app stuck on the "un-mounting volume" stage postgres	6	257	October 27, 2022
Postgres, No Keeper info available	1	380	December 1, 2022
Postgres Error	1	218	October 25, 2023

Postgres server down

Related topics