Unable to reset my postgres instance

SergioDev · February 22, 2023, 11:25am

Hi,

I´m facing a lot of issues with my postgres instance, could you please assist?

flyctl pg restart -a curators-db hangs forever

jerome · February 22, 2023, 12:04pm

Try restarting the machines via fly machine commands

SergioDev · February 22, 2023, 12:21pm

I managed to restart it like this, however, it´s still unable to connect:
flyctl machine restart 9080022f6d6087 -a curators-db

Logs of the instance:

2023-02-22T12:19:43.012 app[9080022f6d6087] waw [info] checking stolon status

2023-02-22T12:19:43.136 app[9080022f6d6087] waw [info] keeper | 2023-02-22T12:19:43.136Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}

2023-02-22T12:19:44.101 app[9080022f6d6087] waw [info] keeper is healthy, db is healthy, role: master

2023-02-22T12:19:45.637 app[9080022f6d6087] waw [info] keeper | 2023-02-22T12:19:45.637Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}

2023-02-22T12:19:48.138 app[9080022f6d6087] waw [info] keeper | 2023-02-22T12:19:48.137Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}

2023-02-22T12:19:49.035 app[9080022f6d6087] waw [info] proxy | [WARNING] 052/121949 (563) : Backup Server bk_db/pg is DOWN, reason: Layer7 timeout, check duration: 5001ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.

2023-02-22T12:19:49.101 app[9080022f6d6087] waw [info] error connecting to local postgres context deadline exceeded

2023-02-22T12:19:49.101 app[9080022f6d6087] waw [info] checking stolon status

2023-02-22T12:19:49.216 app[9080022f6d6087] waw [info] proxy | [WARNING] 052/121949 (563) : Server bk_db/pg1 is DOWN, reason: Layer7 timeout, check duration: 5000ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.

2023-02-22T12:19:49.216 app[9080022f6d6087] waw [info] proxy | [NOTICE] 052/121949 (563) : haproxy version is 2.2.9-2+deb11u3

2023-02-22T12:19:49.216 app[9080022f6d6087] waw [info] proxy | [NOTICE] 052/121949 (563) : path to executable is /usr/sbin/haproxy

2023-02-22T12:19:49.216 app[9080022f6d6087] waw [info] proxy | [ALERT] 052/121949 (563) : backend 'bk_db' has no server available!

2023-02-22T12:19:49.443 app[9080022f6d6087] waw [info] keeper is healthy, db is healthy, role: master

2023-02-22T12:19:50.638 app[9080022f6d6087] waw [info] keeper | 2023-02-22T12:19:50.638Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}

2023-02-22T12:19:53.139 app[9080022f6d6087] waw [info] keeper | 2023-02-22T12:19:53.138Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}

SergioDev · February 22, 2023, 1:16pm

I keep having these logs constantly

2023-02-22T13:14:29.789 app[d5683620b708e9] waw [info] sentinel | 2023-02-22T13:14:29.788Z WARN cmd/sentinel.go:276 no keeper info available {"db": "a2073e1a", "keeper": "842d4b77ec2"}

2023-02-22T13:14:29.794 app[d5683620b708e9] waw [info] sentinel | 2023-02-22T13:14:29.793Z ERROR cmd/sentinel.go:1018 no eligible masters

I tried to extend both RAM to 1GB and storage to 10GB but that didn´t work

jerome · February 22, 2023, 1:40pm

I see you have 2 volumes for your db app in WAW. Is one of them a remnant of a past cluster?

SergioDev · February 22, 2023, 1:47pm

I created a replica at some point, which came with that additional volume, the one I care about is the first

Just destroyed the second volume

SergioDev · February 22, 2023, 6:24pm

Hi @jerome . Any idea of how to solve this? I´m unable to access very important data at the moment

SergioDev · February 23, 2023, 11:56am

Hi, I don’t even need to get postgres up and running, I just need to dump the data that this volume has.

Please, could you help me out?

ignoramous · February 23, 2023, 4:56pm

I haven’t done this, so not sure if it works, but see this guide on recreating Fly-automated Postgres DBs from existing snapshots: Backup, Restores, & Snapshots · Fly Docs

SergioDev · February 23, 2023, 6:54pm

I managed to recover the data of the instance, it was certaintly not easy.

Maybe somebody who is in the same situation could use this info:

The data is saved in /data/postgres, I managed to go inside of the instance, then tar the folder, and picking up that tar.
After that, I installed a postgres instance in my computer, after back and forth modifying few parameters in postgres.config I was able to run postgres and dumping the data.

I loved this platform, however I’m not going to use it from now on to host postgres apps

ignoramous · February 23, 2023, 7:53pm

Glad. The backup / restore commands from volume snapshots (as described in docs shared above) didn’t work for you? If so, that’s scary.

I think Fly know this (ref), and hence the new v2 for Postgres HA that’s more resilient to sw / hw failures: Improved Postgres Clustering with repmgr - Preview