Postgres "failed to connect to proxy: context deadline exceeded"

Hi,
My postgres database stopped working suddenly and I can’t access it from any of my apps.
I get the “Can’t reach database server at top2.nearest.of.***.internal:5432

If I run the “fly checks list -a ” command I get this error:
HTTP GET http://IPADDRESS:5500/flycheck/pg: 500 Internal Server Error Output: "failed to connect to proxy: context deadline exceeded"

I tried restarting the database but no luck.

1 Like

I have the same problem on one my production application. I hope that someone from Fly team will pick this up soon.

I have also tried to restart, upscale and downscale with no luck.

I am able to connect to the server using the SSH console though.

Having the same problem… Waiting for someone from Fly.io team to resolve this since this is a production app

Same problem here!

Same here!

Same problem here too.

Same problem here

Hello everyone. I think we found and fixed the issue with one of our servers. We are actively looking for any other server with a similar issue.

Could you check again please? thanks!

1 Like

The issue seems to be fixed now, but that definitely shouldn’t happen on a production server.

I’m not really confident I can keep my sites running on Fly anymore.

@piotrkulpinski if you need high availability, you should make sure you’re running 2 instances of your app and two Postgres VMs (the default, unless you choose “development” at setup time).

Apps and databases with a single instance will not continue to function when we have issues that affect one of our physical hosts.

Thank you for the support, the issue has been resolved for my case! :blush:

Hello we recently ran into this issue. We’ve tried upscaling and downscaling as well. Our site has been down for a few hours now, thankfully it’s just the sandbox environment. I hope someone from fly.io can look into this

Hey @rafael,

It looks like you have 3 volumes and your scale count is set to 2. When you last scaled, Nomad chose 2 volumes at random to allocate and the one that wasn’t chosen happened to be your leader.

That being said, if you scale your app up to 3 it should address your issue.

1 Like

I too am running into this issue currently. My application is not able to connect to the database:

Postgrex.Protocol (#PID<0.2747.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (top2.nearest.of.***.internal:5432): non-existing domain - :nxdomain

flyctl postgres connect -a ***

Error can’t get role for fdaa:0:ddd8:a7b:2cc3:0:d185:2: 500: context deadline exceeded

@jswanner I would go ahead and update your Postgres image and see if that fixes your issue.

First make sure you’re running the latest flyctl version.

Then run:

Check to see which version you’re on and whether there are available updates.

fly image show --app <app-name>

Update your image.

fly image update --app <app-name>

@shaun, thanks for your help.

Image update was successful, but I’m still not able to connect (new error message now):

flyctl postgres connect -a ***

Connecting to fdaa:0:ddd8:a7b:2cc3:0:d185:2… complete
psql: error: could not translate host name “***.internal” to address: Name or service not known

Hi, I have a similar error.
I’ve updated the image with fly image update, however I still have can’t connect to the database or restart it:

flyctl postgres connect -a hits-db
Error can't get role for fdaa:0:5dda:a7b:2809:0:50d2:2: 500: context deadline exceeded

Are there any other commands that I can try to run to fix this errors? Or did I miss something and this error is unrelated?
Thanks

2 Likes

~I’m having the same problem right now.
What did you do to solve it?~

UPDATE
I fixed the problem by scaling the database to more nodes.

I have the same issue, … sad for production database.

How did you manage to scale to more nodes ? Without our loosing datas ?

I’m having this issue again with a Postgres database on Miami. Is there something I can do?

The error : failed to connect to local node: context deadline exceeded.

I have another Postgres database in Amsterdam that is working fine.

Since the node is unavailable, it’s not possible restart the Postgres application. If I run fly pg restart I get the error : Error: no active leader found.