My Postgres DB is having issues

App is rp5-rinkeby-postgres.

Getting this problem below that’s cascading to other instances that use this. I could try and just restart it but wanted to show you first in case it’s of help. We would appreciate a fast response.

2022-05-27T14:47:29Z app[ca14f587] sjc [info]exporter | ERRO[3151100] Error opening connection to database (postgresql://flypgadmin:PASSWORD_REMOVED@[fdaa:0:5389:a7b:2295:0:d203:2]:5433/postgres?sslmode=disable): dial tcp [fdaa:0:5389:a7b:2295:0:d203:2]:5433: connect: connection refused  source="postgres_exporter.go:1658"
2022-05-27T14:47:29Z app[ca14f587] sjc [info]keeper   | ...2022-05-27T14:47:29.946Z     ERROR   cmd/keeper.go:719       cannot get configured pg parameters     {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2022-05-27T14:47:32Z app[ca14f587] sjc [info]keeper   | ..2022-05-27T14:47:32.446Z      ERROR   cmd/keeper.go:719       cannot get configured pg parameters     {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2022-05-27T14:47:33Z app[ca14f587] sjc [info]sentinel | 2022-05-27T14:47:33.321Z        WARN    cmd/sentinel.go:276     no keeper info available        {"db": "49924dd4", "keeper": "ad10d2022"}
2022-05-27T14:47:33Z app[ca14f587] sjc [info]sentinel | 2022-05-27T14:47:33.323Z        INFO    cmd/sentinel.go:995     master db is failed     {"db": "81e4c0c0", "keeper": "22950d2032"}
2022-05-27T14:47:33Z app[ca14f587] sjc [info]sentinel | 2022-05-27T14:47:33.324Z        INFO    cmd/sentinel.go:1001    db not converged        {"db": "81e4c0c0", "keeper": "22950d2032"}
2022-05-27T14:47:33Z app[ca14f587] sjc [info]sentinel | 2022-05-27T14:47:33.324Z        INFO    cmd/sentinel.go:1006    trying to find a new master to replace failed master
2022-05-27T14:47:33Z app[ca14f587] sjc [info]sentinel | 2022-05-27T14:47:33.324Z        ERROR   cmd/sentinel.go:1009    no eligible masters
2022-05-27T14:47:34Z app[ca14f587] sjc [info]keeper   | ...2022-05-27T14:47:34.947Z     ERROR   cmd/keeper.go:719       cannot get configured pg parameters     {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2022-05-27T14:47:37Z app[ca14f587] sjc [info]keeper   | ..2022-05-27T14:47:37.448Z      ERROR   cmd/keeper.go:719       cannot get configured pg parameters     {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2022-05-27T14:47:38Z app[ca14f587] sjc [info]sentinel | 2022-05-27T14:47:38.959Z        WARN    cmd/sentinel.go:276     no keeper info available        {"db": "49924dd4", "keeper": "ad10d2022"}
2022-05-27T14:47:38Z app[ca14f587] sjc [info]sentinel | 2022-05-27T14:47:38.963Z        INFO    cmd/sentinel.go:995     master db is failed     {"db": "81e4c0c0", "keeper": "22950d2032"}
2022-05-27T14:47:38Z app[ca14f587] sjc [info]sentinel | 2022-05-27T14:47:38.963Z        INFO    cmd/sentinel.go:1001    db not converged        {"db": "81e4c0c0", "keeper": "22950d2032"}
2022-05-27T14:47:38Z app[ca14f587] sjc [info]sentinel | 2022-05-27T14:47:38.963Z        INFO    cmd/sentinel.go:1006    trying to find a new master to replace failed master
2022-05-27T14:47:38Z app[ca14f587] sjc [info]sentinel | 2022-05-27T14:47:38.963Z        ERROR   cmd/sentinel.go:1009    no eligible masters
2022-05-27T14:47:39Z app[ca14f587] sjc [info]keeper   | .. stopped waiting

Hey we’ve seen similar no eligible masters error messages when customers have attempted to change their Postgres leader region.

I think you may find this post helpful

We didn’t try to do that. We actually haven’t updated this postgres since the initial deploy.

1 Like

I tried restarting it:

bash-3.2$ fly restart rp5-rinkeby-postgres
Update available 0.0.320 -> v0.0.321.
Run "fly version update" to upgrade.
rp5-rinkeby-postgres is being restarted

Still having issues though. Appears to have not fixed anything.

Hey @cinjon ,

I checked the logs on the failing VM that was last holding leadership and looks like you ran out of disk space.

failed to start postgres	{"error": "error writing postgresql.conf file: write /data/postgres/postgresql.conf1847956936: no space left on device"}

There has been some improvements made in how this is handled within the latest image release.

Huh, good find. Thank you. Investigating what could have caused that …

Heads up @shaun, I tried just updating to v22, but still am running into this issue. Not sure if that was expected on your end.
(Still trying to figure out how to fix this permanently for us)

I’m trying to just login to see what table ballooned in size, but having trouble doing that. Any tips for doing that? Perhaps I could get an export somehow?