Postgres "failed to connect to proxy: context deadline exceeded"

piotrkulpinski · October 25, 2022, 10:29am

Hi,
My postgres database stopped working suddenly and I can’t access it from any of my apps.
I get the “Can’t reach database server at top2.nearest.of.***.internal:5432”

If I run the “fly checks list -a ” command I get this error:
HTTP GET http://IPADDRESS:5500/flycheck/pg: 500 Internal Server Error Output: "failed to connect to proxy: context deadline exceeded"

I tried restarting the database but no luck.

michael3 · October 25, 2022, 12:11pm

I have the same problem on one my production application. I hope that someone from Fly team will pick this up soon.

I have also tried to restart, upscale and downscale with no luck.

I am able to connect to the server using the SSH console though.

fpiskur · October 25, 2022, 4:18pm

Having the same problem… Waiting for someone from Fly.io team to resolve this since this is a production app

jeyj0 · October 25, 2022, 5:25pm

Same problem here!

oleksify · October 25, 2022, 5:35pm

Same here!

Elin_Olsson · October 25, 2022, 5:35pm

Same problem here too.

akheron · October 25, 2022, 7:22pm

Same problem here

dangra · October 25, 2022, 7:46pm

Hello everyone. I think we found and fixed the issue with one of our servers. We are actively looking for any other server with a similar issue.

Could you check again please? thanks!

piotrkulpinski · October 25, 2022, 7:50pm

The issue seems to be fixed now, but that definitely shouldn’t happen on a production server.

I’m not really confident I can keep my sites running on Fly anymore.

kurt · October 25, 2022, 7:55pm

@piotrkulpinski if you need high availability, you should make sure you’re running 2 instances of your app and two Postgres VMs (the default, unless you choose “development” at setup time).

Apps and databases with a single instance will not continue to function when we have issues that affect one of our physical hosts.

jeyj0 · October 25, 2022, 8:43pm

Thank you for the support, the issue has been resolved for my case!

rafael · November 22, 2022, 1:23pm

Hello we recently ran into this issue. We’ve tried upscaling and downscaling as well. Our site has been down for a few hours now, thankfully it’s just the sandbox environment. I hope someone from fly.io can look into this

shaun · November 22, 2022, 5:07pm

Hey @rafael,

It looks like you have 3 volumes and your scale count is set to 2. When you last scaled, Nomad chose 2 volumes at random to allocate and the one that wasn’t chosen happened to be your leader.

That being said, if you scale your app up to 3 it should address your issue.

jswanner · November 22, 2022, 5:34pm

I too am running into this issue currently. My application is not able to connect to the database:

Postgrex.Protocol (#PID<0.2747.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (top2.nearest.of.***.internal:5432): non-existing domain - :nxdomain

flyctl postgres connect -a ***

Error can’t get role for fdaa:0:ddd8:a7b:2cc3:0:d185:2: 500: context deadline exceeded

shaun · November 22, 2022, 5:43pm

@jswanner I would go ahead and update your Postgres image and see if that fixes your issue.

First make sure you’re running the latest flyctl version.

Then run:

Check to see which version you’re on and whether there are available updates.

fly image show --app <app-name>

Update your image.

fly image update --app <app-name>

jswanner · November 22, 2022, 5:47pm

@shaun, thanks for your help.

Image update was successful, but I’m still not able to connect (new error message now):

flyctl postgres connect -a ***

Connecting to fdaa:0:ddd8:a7b:2cc3:0:d185:2… complete
psql: error: could not translate host name “***.internal” to address: Name or service not known

SimonLab · December 5, 2022, 4:35pm

Hi, I have a similar error.
I’ve updated the image with fly image update, however I still have can’t connect to the database or restart it:

flyctl postgres connect -a hits-db

Error can't get role for fdaa:0:5dda:a7b:2809:0:50d2:2: 500: context deadline exceeded

Are there any other commands that I can try to run to fix this errors? Or did I miss something and this error is unrelated?
Thanks

nicanorperera · December 30, 2022, 8:30pm

~I’m having the same problem right now.
What did you do to solve it?~

UPDATE
I fixed the problem by scaling the database to more nodes.

thomasgalibert · May 5, 2023, 8:44am

I have the same issue, … sad for production database.

How did you manage to scale to more nodes ? Without our loosing datas ?

GinQuin · October 14, 2023, 8:28am

I’m having this issue again with a Postgres database on Miami. Is there something I can do?

The error : failed to connect to local node: context deadline exceeded.

I have another Postgres database in Amsterdam that is working fine.

Since the node is unavailable, it’s not possible restart the Postgres application. If I run fly pg restart I get the error : Error: no active leader found.

Topic		Replies	Views
Fly Postgres proxy issues and dropped connections(started 2 hours ago) Questions / Help postgres	2	507	December 17, 2022
Unable to connect to postgres via fly postgres connect, or proxy. Questions / Help postgres	2	2125	December 8, 2022
Deploys failing LHR, postgres proxy failures, intermittent db connection issues	5	461	October 20, 2022
Postgres "server misbehaving" error messages postgres	8	2012	October 26, 2022
POSTGRES is Unresponsive Questions / Help postgres	1	294	August 18, 2022

Postgres "failed to connect to proxy: context deadline exceeded"

Related topics