We have a small prototype that ran successfully on the free tier (256MB/1 GB) for 8 months. Within the last two weeks, however, it crashed. We have extended the volume size to 5GB of the original amply-portal-production-db,
and also naively tried to restore a backup onto a new cluster with a 10GB drive amply-portal-production-db-v2
. Both apps fail to boot with the same message; pg detach and pg attach also fail.
2022-11-17T02:07:17.056 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:17.056Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2022-11-17T02:07:17.560 app[c6691ba4] ewr [info] sentinel | 2022-11-17T02:07:17.559Z ERROR cmd/sentinel.go:1018 no eligible masters
2022-11-17T02:07:17.567 app[c6691ba4] ewr [info] checking stolon status
2022-11-17T02:07:18.567 app[c6691ba4] ewr [info] checking stolon status
2022-11-17T02:07:19.557 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:19.556Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2022-11-17T02:07:19.568 app[c6691ba4] ewr [info] checking stolon status
2022-11-17T02:07:20.261 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:20.260Z ERROR cmd/keeper.go:1470 failed to retrieve instance status {"error": "cannot get instance state: exit status 1"}
2022-11-17T02:07:20.283 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:20.282Z ERROR cmd/keeper.go:1514 failed to retrieve instance status {"error": "cannot get instance state: exit status 1"}
2022-11-17T02:07:20.404 app[c6691ba4] ewr [info] exporter | INFO[0539] Established new database connection to "fdaa:0:4e24:a7b:ab2:0:964b:2:5433". source="postgres_exporter.go:970"
2022-11-17T02:07:20.567 app[c6691ba4] ewr [info] checking stolon status
2022-11-17T02:07:21.405 app[c6691ba4] ewr [info] exporter | ERRO[0540] Error opening connection to database (postgresql://flypgadmin:PASSWORD_REMOVED@[fdaa:0:4e24:a7b:ab2:0:964b:2]:5433/postgres?sslmode=disable): dial tcp [fdaa:0:4e24:a7b:ab2:0:964b:2]:5433: connect: connection refused source="postgres_exporter.go:1658"
2022-11-17T02:07:21.567 app[c6691ba4] ewr [info] checking stolon status
2022-11-17T02:07:22.058 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:22.057Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2022-11-17T02:07:22.567 app[c6691ba4] ewr [info] checking stolon status
2022-11-17T02:07:22.789 app[c6691ba4] ewr [info] sentinel | 2022-11-17T02:07:22.789Z ERROR cmd/sentinel.go:1018 no eligible masters
2022-11-17T02:07:23.567 app[c6691ba4] ewr [info] checking stolon status
2022-11-17T02:07:24.559 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:24.558Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2022-11-17T02:07:24.567 app[c6691ba4] ewr [info] checking stolon status
2022-11-17T02:07:25.318 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:25.317Z ERROR cmd/keeper.go:1470 failed to retrieve instance status {"error": "cannot get instance state: exit status 1"}
2022-11-17T02:07:25.363 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:25.363Z ERROR cmd/keeper.go:1514 failed to retrieve instance status {"error": "cannot get instance state: exit status 1"}
2022-11-17T02:07:25.567 app[c6691ba4] ewr [info] checking stolon status
2022-11-17T02:07:26.567 app[c6691ba4] ewr [info] checking stolon status
2022-11-17T02:07:27.059 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:27.059Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2022-11-17T02:07:27.567 app[c6691ba4] ewr [info] checking stolon status
2022-11-17T02:07:28.050 app[c6691ba4] ewr [info] sentinel | 2022-11-17T02:07:28.050Z ERROR cmd/sentinel.go:1018 no eligible masters
2022-11-17T02:07:28.567 app[c6691ba4] ewr [info] checking stolon status
2022-11-17T02:07:29.560 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:29.559Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2022-11-17T02:07:29.567 app[c6691ba4] ewr [info] checking stolon status
2022-11-17T02:07:30.378 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:30.378Z ERROR cmd/keeper.go:1470 failed to retrieve instance status {"error": "cannot get instance state: exit status 1"}
2022-11-17T02:07:30.405 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:30.405Z ERROR cmd/keeper.go:1514 failed to retrieve instance status {"error": "cannot get instance state: exit status 1"}
2022-11-17T02:07:30.567 app[c6691ba4] ewr [info] checking stolon status
2022-11-17T02:07:31.567 app[c6691ba4] ewr [info] checking stolon status
2022-11-17T02:07:32.061 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:32.060Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2022-11-17T02:07:32.567 app[c6691ba4] ewr [info] checking stolon status
2022-11-17T02:07:33.168 app[c6691ba4] ewr [info] sentinel | 2022-11-17T02:07:33.168Z ERROR cmd/sentinel.go:1018 no eligible masters
2022-11-17T02:07:33.568 app[c6691ba4] ewr [info] checking stolon status
2022-11-17T02:07:34.562 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:34.561Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2022-11-17T02:07:34.568 app[c6691ba4] ewr [info] checking stolon status
2022-11-17T02:07:35.404 app[c6691ba4] ewr [info] exporter | INFO[0554] Established new database connection to "fdaa:0:4e24:a7b:ab2:0:964b:2:5433". source="postgres_exporter.go:970"
2022-11-17T02:07:35.417 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:35.417Z ERROR cmd/keeper.go:1470 failed to retrieve instance status {"error": "cannot get instance state: exit status 1"}
2022-11-17T02:07:35.441 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:35.441Z ERROR cmd/keeper.go:1514 failed to retrieve instance status {"error": "cannot get instance state: exit status 1"}
2022-11-17T02:07:35.568 app[c6691ba4] ewr [info] checking stolon status
2022-11-17T02:07:36.405 app[c6691ba4] ewr [info] exporter | ERRO[0555] Error opening connection to database (postgresql://flypgadmin:PASSWORD_REMOVED@[fdaa:0:4e24:a7b:ab2:0:964b:2]:5433/postgres?sslmode=disable): dial tcp [fdaa:0:4e24:a7b:ab2:0:964b:2]:5433: connect: connection refused source="postgres_exporter.go:1658"
2022-11-17T02:07:36.568 app[c6691ba4] ewr [info] checking stolon status
2022-11-17T02:07:37.063 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:37.062Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2022-11-17T02:07:37.567 app[c6691ba4] ewr [info] checking stolon status
2022-11-17T02:07:38.264 app[c6691ba4] ewr [info] sentinel | 2022-11-17T02:07:38.263Z ERROR cmd/sentinel.go:1018 no eligible masters
2022-11-17T02:07:38.572 app[c6691ba4] ewr [info] checking stolon status
2022-11-17T02:07:39.564 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:39.563Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2022-11-17T02:07:39.567 app[c6691ba4] ewr [info] checking stolon status
2022-11-17T02:07:40.468 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:40.468Z ERROR cmd/keeper.go:1470 failed to retrieve instance status {"error": "cannot get instance state: exit status 1"}
2022-11-17T02:07:40.493 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:40.492Z ERROR cmd/keeper.go:1514 failed to retrieve instance status {"error": "cannot get instance state: exit status 1"}
2022-11-17T02:07:40.568 app[c6691ba4] ewr [info] checking stolon status
2022-11-17T02:07:41.567 app[c6691ba4] ewr [info] checking stolon status
2022-11-17T02:07:42.065 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:42.064Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2022-11-17T02:07:42.567 app[c6691ba4] ewr [info] checking stolon status
2022-11-17T02:07:43.567 app[c6691ba4] ewr [info] checking stolon status
2022-11-17T02:07:43.618 app[c6691ba4] ewr [info] sentinel | 2022-11-17T02:07:43.617Z ERROR cmd/sentinel.go:1018 no eligible masters
2022-11-17T02:07:44.565 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:44.565Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2022-11-17T02:07:44.567 app[c6691ba4] ewr [info] checking stolon status
2022-11-17T02:07:45.518 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:45.518Z ERROR cmd/keeper.go:1470 failed to retrieve instance status {"error": "cannot get instance state: exit status 1"}
2022-11-17T02:07:45.544 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:45.544Z ERROR cmd/keeper.go:1514 failed to retrieve instance status {"error": "cannot get instance state: exit status 1"}
2022-11-17T02:07:45.567 app[c6691ba4] ewr [info] checking stolon status
2022-11-17T02:07:46.567 app[c6691ba4] ewr [info] checking stolon status
2022-11-17T02:07:47.066 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:47.065Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2022-11-17T02:07:47.568 app[c6691ba4] ewr [info] checking stolon status
2022-11-17T02:07:48.567 app[c6691ba4] ewr [info] checking stolon status
2022-11-17T02:07:48.937 app[c6691ba4] ewr [info] sentinel | 2022-11-17T02:07:48.937Z ERROR cmd/sentinel.go:1018 no eligible masters
2022-11-17T02:07:49.567 app[c6691ba4] ewr [info] checking stolon status
2022-11-17T02:07:49.568 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:49.567Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2022-11-17T02:07:50.404 app[c6691ba4] ewr [info] exporter | INFO[0569] Established new database connection to "fdaa:0:4e24:a7b:ab2:0:964b:2:5433". source="postgres_exporter.go:970"
2022-11-17T02:07:50.567 app[c6691ba4] ewr [info] checking stolon status ...
We have also tried the instructions here for resetting the WAL set point.
Any chance we can get this demo data back up and running. Apologies that we were not monitoring this demo database and we are very grateful for a simple, fast free tier. We understand a managed postgres cluster is in our future
These instructions from the Postgres guide suggest there might be some well known steps to recover:
- Recovering from outages & fail-overs - If the volume in your database fills up, a replica fails, etc. you’ll have to do a little bit of work to bring your database back online.
Many thanks for any advice!