Postgres 1GB instance crashed (WAL, Logs, Disk Space)

We have a small prototype that ran successfully on the free tier (256MB/1 GB) for 8 months. Within the last two weeks, however, it crashed. We have extended the volume size to 5GB of the original amply-portal-production-db, and also naively tried to restore a backup onto a new cluster with a 10GB drive amply-portal-production-db-v2. Both apps fail to boot with the same message; pg detach and pg attach also fail.

2022-11-17T02:07:17.056 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:17.056Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}

2022-11-17T02:07:17.560 app[c6691ba4] ewr [info] sentinel | 2022-11-17T02:07:17.559Z ERROR cmd/sentinel.go:1018 no eligible masters

2022-11-17T02:07:17.567 app[c6691ba4] ewr [info] checking stolon status

2022-11-17T02:07:18.567 app[c6691ba4] ewr [info] checking stolon status

2022-11-17T02:07:19.557 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:19.556Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}

2022-11-17T02:07:19.568 app[c6691ba4] ewr [info] checking stolon status

2022-11-17T02:07:20.261 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:20.260Z ERROR cmd/keeper.go:1470 failed to retrieve instance status {"error": "cannot get instance state: exit status 1"}

2022-11-17T02:07:20.283 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:20.282Z ERROR cmd/keeper.go:1514 failed to retrieve instance status {"error": "cannot get instance state: exit status 1"}

2022-11-17T02:07:20.404 app[c6691ba4] ewr [info] exporter | INFO[0539] Established new database connection to "fdaa:0:4e24:a7b:ab2:0:964b:2:5433". source="postgres_exporter.go:970"

2022-11-17T02:07:20.567 app[c6691ba4] ewr [info] checking stolon status

2022-11-17T02:07:21.405 app[c6691ba4] ewr [info] exporter | ERRO[0540] Error opening connection to database (postgresql://flypgadmin:PASSWORD_REMOVED@[fdaa:0:4e24:a7b:ab2:0:964b:2]:5433/postgres?sslmode=disable): dial tcp [fdaa:0:4e24:a7b:ab2:0:964b:2]:5433: connect: connection refused source="postgres_exporter.go:1658"

2022-11-17T02:07:21.567 app[c6691ba4] ewr [info] checking stolon status

2022-11-17T02:07:22.058 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:22.057Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}

2022-11-17T02:07:22.567 app[c6691ba4] ewr [info] checking stolon status

2022-11-17T02:07:22.789 app[c6691ba4] ewr [info] sentinel | 2022-11-17T02:07:22.789Z ERROR cmd/sentinel.go:1018 no eligible masters

2022-11-17T02:07:23.567 app[c6691ba4] ewr [info] checking stolon status

2022-11-17T02:07:24.559 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:24.558Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}

2022-11-17T02:07:24.567 app[c6691ba4] ewr [info] checking stolon status

2022-11-17T02:07:25.318 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:25.317Z ERROR cmd/keeper.go:1470 failed to retrieve instance status {"error": "cannot get instance state: exit status 1"}

2022-11-17T02:07:25.363 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:25.363Z ERROR cmd/keeper.go:1514 failed to retrieve instance status {"error": "cannot get instance state: exit status 1"}

2022-11-17T02:07:25.567 app[c6691ba4] ewr [info] checking stolon status

2022-11-17T02:07:26.567 app[c6691ba4] ewr [info] checking stolon status

2022-11-17T02:07:27.059 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:27.059Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}

2022-11-17T02:07:27.567 app[c6691ba4] ewr [info] checking stolon status

2022-11-17T02:07:28.050 app[c6691ba4] ewr [info] sentinel | 2022-11-17T02:07:28.050Z ERROR cmd/sentinel.go:1018 no eligible masters

2022-11-17T02:07:28.567 app[c6691ba4] ewr [info] checking stolon status

2022-11-17T02:07:29.560 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:29.559Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}

2022-11-17T02:07:29.567 app[c6691ba4] ewr [info] checking stolon status

2022-11-17T02:07:30.378 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:30.378Z ERROR cmd/keeper.go:1470 failed to retrieve instance status {"error": "cannot get instance state: exit status 1"}

2022-11-17T02:07:30.405 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:30.405Z ERROR cmd/keeper.go:1514 failed to retrieve instance status {"error": "cannot get instance state: exit status 1"}

2022-11-17T02:07:30.567 app[c6691ba4] ewr [info] checking stolon status

2022-11-17T02:07:31.567 app[c6691ba4] ewr [info] checking stolon status

2022-11-17T02:07:32.061 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:32.060Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}

2022-11-17T02:07:32.567 app[c6691ba4] ewr [info] checking stolon status

2022-11-17T02:07:33.168 app[c6691ba4] ewr [info] sentinel | 2022-11-17T02:07:33.168Z ERROR cmd/sentinel.go:1018 no eligible masters

2022-11-17T02:07:33.568 app[c6691ba4] ewr [info] checking stolon status

2022-11-17T02:07:34.562 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:34.561Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}

2022-11-17T02:07:34.568 app[c6691ba4] ewr [info] checking stolon status

2022-11-17T02:07:35.404 app[c6691ba4] ewr [info] exporter | INFO[0554] Established new database connection to "fdaa:0:4e24:a7b:ab2:0:964b:2:5433". source="postgres_exporter.go:970"

2022-11-17T02:07:35.417 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:35.417Z ERROR cmd/keeper.go:1470 failed to retrieve instance status {"error": "cannot get instance state: exit status 1"}

2022-11-17T02:07:35.441 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:35.441Z ERROR cmd/keeper.go:1514 failed to retrieve instance status {"error": "cannot get instance state: exit status 1"}

2022-11-17T02:07:35.568 app[c6691ba4] ewr [info] checking stolon status

2022-11-17T02:07:36.405 app[c6691ba4] ewr [info] exporter | ERRO[0555] Error opening connection to database (postgresql://flypgadmin:PASSWORD_REMOVED@[fdaa:0:4e24:a7b:ab2:0:964b:2]:5433/postgres?sslmode=disable): dial tcp [fdaa:0:4e24:a7b:ab2:0:964b:2]:5433: connect: connection refused source="postgres_exporter.go:1658"

2022-11-17T02:07:36.568 app[c6691ba4] ewr [info] checking stolon status

2022-11-17T02:07:37.063 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:37.062Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}

2022-11-17T02:07:37.567 app[c6691ba4] ewr [info] checking stolon status

2022-11-17T02:07:38.264 app[c6691ba4] ewr [info] sentinel | 2022-11-17T02:07:38.263Z ERROR cmd/sentinel.go:1018 no eligible masters

2022-11-17T02:07:38.572 app[c6691ba4] ewr [info] checking stolon status

2022-11-17T02:07:39.564 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:39.563Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}

2022-11-17T02:07:39.567 app[c6691ba4] ewr [info] checking stolon status

2022-11-17T02:07:40.468 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:40.468Z ERROR cmd/keeper.go:1470 failed to retrieve instance status {"error": "cannot get instance state: exit status 1"}

2022-11-17T02:07:40.493 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:40.492Z ERROR cmd/keeper.go:1514 failed to retrieve instance status {"error": "cannot get instance state: exit status 1"}

2022-11-17T02:07:40.568 app[c6691ba4] ewr [info] checking stolon status

2022-11-17T02:07:41.567 app[c6691ba4] ewr [info] checking stolon status

2022-11-17T02:07:42.065 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:42.064Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}

2022-11-17T02:07:42.567 app[c6691ba4] ewr [info] checking stolon status

2022-11-17T02:07:43.567 app[c6691ba4] ewr [info] checking stolon status

2022-11-17T02:07:43.618 app[c6691ba4] ewr [info] sentinel | 2022-11-17T02:07:43.617Z ERROR cmd/sentinel.go:1018 no eligible masters

2022-11-17T02:07:44.565 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:44.565Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}

2022-11-17T02:07:44.567 app[c6691ba4] ewr [info] checking stolon status

2022-11-17T02:07:45.518 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:45.518Z ERROR cmd/keeper.go:1470 failed to retrieve instance status {"error": "cannot get instance state: exit status 1"}

2022-11-17T02:07:45.544 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:45.544Z ERROR cmd/keeper.go:1514 failed to retrieve instance status {"error": "cannot get instance state: exit status 1"}

2022-11-17T02:07:45.567 app[c6691ba4] ewr [info] checking stolon status

2022-11-17T02:07:46.567 app[c6691ba4] ewr [info] checking stolon status

2022-11-17T02:07:47.066 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:47.065Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}

2022-11-17T02:07:47.568 app[c6691ba4] ewr [info] checking stolon status

2022-11-17T02:07:48.567 app[c6691ba4] ewr [info] checking stolon status

2022-11-17T02:07:48.937 app[c6691ba4] ewr [info] sentinel | 2022-11-17T02:07:48.937Z ERROR cmd/sentinel.go:1018 no eligible masters

2022-11-17T02:07:49.567 app[c6691ba4] ewr [info] checking stolon status

2022-11-17T02:07:49.568 app[c6691ba4] ewr [info] keeper | 2022-11-17T02:07:49.567Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}

2022-11-17T02:07:50.404 app[c6691ba4] ewr [info] exporter | INFO[0569] Established new database connection to "fdaa:0:4e24:a7b:ab2:0:964b:2:5433". source="postgres_exporter.go:970"

2022-11-17T02:07:50.567 app[c6691ba4] ewr [info] checking stolon status ...

We have also tried the instructions here for resetting the WAL set point.

Any chance we can get this demo data back up and running. Apologies that we were not monitoring this demo database and we are very grateful for a simple, fast free tier. We understand a managed postgres cluster is in our future :slight_smile:

These instructions from the Postgres guide suggest there might be some well known steps to recover:

  • Recovering from outages & fail-overs - If the volume in your database fills up, a replica fails, etc. you’ll have to do a little bit of work to bring your database back online.

Many thanks for any advice!

We’re looking at this!

@cforcey Your amply-portal-production-db app should be good to go! I would maybe delete your amply-portal-production-db-v2 app if you don’t need it anymore.

Some of the files in your amply-portal-production-db app were corrupt, I suspect this happened when the Volume filled up. Unfortunately, those same corrupt files were included in the backup you restored from.

1 Like

That is amazing! I deleted the amply-portal-production-db-v2 app in the dashboard.

I may have made life more complicated by setting DATABASE_URL directly when I could not get amply-portal-production-db-v2 to start either. Now I seem stuck at:

portal git:(main) ✗ fly pg detach -a amply-portal-production amply-portal-production-db-v2
Oops, something went wrong! Could you try that again?

and

fly postgres attach amply-portal-production-db    
Error consumer app "amply-portal-production" already contains a secret named DATABASE_URL

I am so grateful for the help and apologize for thrashing around before asking for help :slight_smile:

I am going to try to manually set it to my originally saved credentials:

DATABASE_URL=postgres://[USERNAME]:[PASSWORD]@top2.nearest.of.amply-portal-production-db.internal:5432/amply_portal_production

That did it! Thanks so much for your help!

2 Likes