Hi,
I´m facing a lot of issues with my postgres instance, could you please assist?
flyctl pg restart -a curators-db hangs forever
Hi,
I´m facing a lot of issues with my postgres instance, could you please assist?
flyctl pg restart -a curators-db hangs forever
Try restarting the machines via fly machine commands
I managed to restart it like this, however, it´s still unable to connect:
flyctl machine restart 9080022f6d6087 -a curators-db
Logs of the instance:
2023-02-22T12:19:43.012 app[9080022f6d6087] waw [info] checking stolon status
2023-02-22T12:19:43.136 app[9080022f6d6087] waw [info] keeper | 2023-02-22T12:19:43.136Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2023-02-22T12:19:44.101 app[9080022f6d6087] waw [info] keeper is healthy, db is healthy, role: master
2023-02-22T12:19:45.637 app[9080022f6d6087] waw [info] keeper | 2023-02-22T12:19:45.637Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2023-02-22T12:19:48.138 app[9080022f6d6087] waw [info] keeper | 2023-02-22T12:19:48.137Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2023-02-22T12:19:49.035 app[9080022f6d6087] waw [info] proxy | [WARNING] 052/121949 (563) : Backup Server bk_db/pg is DOWN, reason: Layer7 timeout, check duration: 5001ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2023-02-22T12:19:49.101 app[9080022f6d6087] waw [info] error connecting to local postgres context deadline exceeded
2023-02-22T12:19:49.101 app[9080022f6d6087] waw [info] checking stolon status
2023-02-22T12:19:49.216 app[9080022f6d6087] waw [info] proxy | [WARNING] 052/121949 (563) : Server bk_db/pg1 is DOWN, reason: Layer7 timeout, check duration: 5000ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2023-02-22T12:19:49.216 app[9080022f6d6087] waw [info] proxy | [NOTICE] 052/121949 (563) : haproxy version is 2.2.9-2+deb11u3
2023-02-22T12:19:49.216 app[9080022f6d6087] waw [info] proxy | [NOTICE] 052/121949 (563) : path to executable is /usr/sbin/haproxy
2023-02-22T12:19:49.216 app[9080022f6d6087] waw [info] proxy | [ALERT] 052/121949 (563) : backend 'bk_db' has no server available!
2023-02-22T12:19:49.443 app[9080022f6d6087] waw [info] keeper is healthy, db is healthy, role: master
2023-02-22T12:19:50.638 app[9080022f6d6087] waw [info] keeper | 2023-02-22T12:19:50.638Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2023-02-22T12:19:53.139 app[9080022f6d6087] waw [info] keeper | 2023-02-22T12:19:53.138Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
I keep having these logs constantly
2023-02-22T13:14:29.789 app[d5683620b708e9] waw [info] sentinel | 2023-02-22T13:14:29.788Z WARN cmd/sentinel.go:276 no keeper info available {"db": "a2073e1a", "keeper": "842d4b77ec2"}
2023-02-22T13:14:29.794 app[d5683620b708e9] waw [info] sentinel | 2023-02-22T13:14:29.793Z ERROR cmd/sentinel.go:1018 no eligible masters
I tried to extend both RAM to 1GB and storage to 10GB but that didn´t work
I see you have 2 volumes for your db app in WAW. Is one of them a remnant of a past cluster?
I created a replica at some point, which came with that additional volume, the one I care about is the first
Just destroyed the second volume
Hi @jerome . Any idea of how to solve this? I´m unable to access very important data at the moment
Hi, I don’t even need to get postgres up and running, I just need to dump the data that this volume has.
Please, could you help me out?
I haven’t done this, so not sure if it works, but see this guide on recreating Fly-automated Postgres DBs from existing snapshots: Backup, Restores, & Snapshots · Fly Docs
I managed to recover the data of the instance, it was certaintly not easy.
Maybe somebody who is in the same situation could use this info:
I loved this platform, however I’m not going to use it from now on to host postgres apps
Glad. The backup / restore commands from volume snapshots (as described in docs shared above) didn’t work for you? If so, that’s scary.
I think Fly know this (ref), and hence the new v2 for Postgres HA that’s more resilient to sw / hw failures: Improved Postgres Clustering with repmgr - Preview