Database unavailable. Can't connect nor restart or anything else.

I have an (elixir/phoenix) app attached to a psql instance for a while now.
Everything was working fine up until the day it wasn’t.

Logs are filled with:

 2023-02-27T21:18:43.217 app[21be8c78] cdg [info] 21:18:43.216 [error] Postgrex.Protocol (#PID<0.2027.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (top2.nearest.of.clasfle-db.internal:5432): non-existing domain - :nxdomain 

This is not an inet 6 issue as found here as this was working for more than a year now.

If I try to fly postgres connect -a clasfle-db this is what I get:

Error can't establish agent Post "https://api.fly.io/graphql": read tcp [2804:14d:5685:8fac:f49d:1892:5323:f2e5]:41658->[2a09:8280:1:f28:246e:d6a:949:dbbf]:443: read: connection reset by peer

I can open https://api.fly.io/graphql from the browser.

If I do fly status -a clasfle-db --all this is what I see:

ID              STATE   ROLE    REGION  HEALTH CHECKS   IMAGE                           CREATED                 UPDATED              
148e392f724438  started error   cdg     3 total         flyio/postgres:14.6 (v0.0.34)   2023-01-19T14:33:44Z    2023-02-19T12:45:52Z

Trying flyctl proxy 5432 -a clasfle-db results in:

Error clasfle-db.internal: host was not found in DNS

If I try to fly pg restart -a clasfle-db I get this back:

Error no active leader found

What else can I try here ?!
I need to make a backup of this data and I can’t move forward from here :frowning:

Thanks in advance and let me know if I can provide more information.

Hi,

It sounds like someone else is having the same issue:

The reply from Fly in that thread says that restarting should help. However you say it doesn’t :thinking:

Given that … do you see any more info if you look at the status of that individual vm’s id you list above? e.g fly vm status 148e392f724438 ? I wonder if that reveals e.g any out of memory error or something to work with.

If not, can you get at a private IP for the vm? Like if you run fly ips private -a clasfle-db … does that reveal it having one? If it does have a private IPv6, can you connect to that (don’t reveal it in here)? e.g fly ssh console "long-ip-v6-here"

If you can SSH in to the vm, I wonder if you would then be able to at least get at the data on it e.g the mounted volume? I’ve not tried doing that with a postgres vm so it’s a total guess I’m afraid.

1 Like

Hi @thiago, Since your Postgres app is a Machines app, you can try restarting it using the fly machine commands:

List machines for your Postgres app
fly machine list -a clasfle-db

Restart machine
fly machine restart <machine-id> -a clasfle-db

fly postgres assumes a postgres app always has a leader node. If the app has got into a bad state and there isn’t an active leader it can run into problems, as you saw. Fly machine operates a level lower, and will reboot the underlaying machine regardless of leader status.

Hope this helps!

1 Like

Thank you very much @Sam-Fly . I wasn’t aware of the fly machine commands and it solved my problem.

Also thanks @greg for the reply.

2 Likes