My postgres database stopped working suddenly and I can’t access it from any of my apps.
I get the “Can’t reach database server at
If I run the “fly checks list -a ” command I get this error:
HTTP GET http://IPADDRESS:5500/flycheck/pg: 500 Internal Server Error Output: "failed to connect to proxy: context deadline exceeded"
I tried restarting the database but no luck.
I have the same problem on one my production application. I hope that someone from Fly team will pick this up soon.
I have also tried to restart, upscale and downscale with no luck.
I am able to connect to the server using the SSH console though.
Having the same problem… Waiting for someone from Fly.io team to resolve this since this is a production app
Hello everyone. I think we found and fixed the issue with one of our servers. We are actively looking for any other server with a similar issue.
Could you check again please? thanks!
The issue seems to be fixed now, but that definitely shouldn’t happen on a production server.
I’m not really confident I can keep my sites running on Fly anymore.
@piotrkulpinski if you need high availability, you should make sure you’re running 2 instances of your app and two Postgres VMs (the default, unless you choose “development” at setup time).
Apps and databases with a single instance will not continue to function when we have issues that affect one of our physical hosts.
Thank you for the support, the issue has been resolved for my case!
Hello we recently ran into this issue. We’ve tried upscaling and downscaling as well. Our site has been down for a few hours now, thankfully it’s just the sandbox environment. I hope someone from fly.io can look into this
It looks like you have 3 volumes and your scale count is set to 2. When you last scaled, Nomad chose 2 volumes at random to allocate and the one that wasn’t chosen happened to be your leader.
That being said, if you scale your app up to 3 it should address your issue.
I too am running into this issue currently. My application is not able to connect to the database:
Postgrex.Protocol (#PID<0.2747.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (top2.nearest.of.***.internal:5432): non-existing domain - :nxdomain
flyctl postgres connect -a ***
Error can’t get role for fdaa:0:ddd8:a7b:2cc3:0:d185:2: 500: context deadline exceeded
@jswanner I would go ahead and update your Postgres image and see if that fixes your issue.
First make sure you’re running the latest
Check to see which version you’re on and whether there are available updates.
fly image show --app <app-name>
Update your image.
fly image update --app <app-name>
@shaun, thanks for your help.
Image update was successful, but I’m still not able to connect (new error message now):
flyctl postgres connect -a ***
Connecting to fdaa:0:ddd8:a7b:2cc3:0:d185:2… complete
psql: error: could not translate host name “***.internal” to address: Name or service not known
Hi, I have a similar error.
I’ve updated the image with
fly image update, however I still have can’t connect to the database or restart it:
flyctl postgres connect -a hits-db
Error can't get role for fdaa:0:5dda:a7b:2809:0:50d2:2: 500: context deadline exceeded
Are there any other commands that I can try to run to fix this errors? Or did I miss something and this error is unrelated?
~I’m having the same problem right now.
What did you do to solve it?~
I fixed the problem by scaling the database to more nodes.
I have the same issue, … sad for production database.
How did you manage to scale to more nodes ? Without our loosing datas ?