Can't reach database #ams #flycast

binajmen · September 12, 2023, 2:50pm

Hi,

I’m having issue with a PG database hosted in ams. My app that was working is suddenly not able to connect to the database. I try to redeploy, but deployment is failing for the same reason.

Error: P1001: Can't reach database server at `fbnb-xxx-db.flycast`:`5432`

Everything seems normal:

➜  ~ fly checks list -a fbnb-xxx-db
Health Checks for fbnb-xxx-db
  NAME | STATUS  | MACHINE        | LAST UPDATED         | OUTPUT
-------*---------*----------------*----------------------*-----------------------------------------------------------------------------
  pg   | passing | 4d891d60a16578 | 2023-08-22T23:57:14Z | [✓] connections: 11 used, 3 reserved, 300 max (3.6ms)
       |         |                |                      | [✓] cluster-locks: No active locks detected (8.88µs)
       |         |                |                      | [✓] disk-capacity: 14.4% - readonly mode will be enabled at 90.0% (9.21µs)
-------*---------*----------------*----------------------*-----------------------------------------------------------------------------
  role | passing | 4d891d60a16578 | 2023-08-22T23:57:17Z | primary
-------*---------*----------------*----------------------*-----------------------------------------------------------------------------
  vm   | passing | 4d891d60a16578 | 2023-08-22T23:57:11Z | [✓] checkDisk: 846 MB (85.6%) free space on /data/ (43.41µs)
       |         |                |                      | [✓] checkLoad: load averages: 0.01 0.04 0.16 (48.63µs)
       |         |                |                      | [✓] memory: system spent 0s of the last 60s waiting on memory (34.47µs)
       |         |                |                      | [✓] cpu: system spent 138ms of the last 60s waiting on cpu (14.64µs)
       |         |                |                      | [✓] io: system spent 0s of the last 60s waiting on io (12.85µs)
-------*---------*----------------*----------------------*-----------------------------------------------------------------------------

And I’m able to connect directly via fly proxy (although it is using .internal instead of .flycast):

➜  ~ fly proxy 15432:5432 -a fbnb-exam-db
Proxying local port 15432 to remote [fbnb-exam-db.internal]:5432

Beaux · September 12, 2023, 4:50pm

I’m having the exact same issue. Not only can my app no longer reach the database, but the app itself is also very unresponsive. Both are hosted in the ams region. It may be related to this issue: App broken: could not find a good candidate within 90 attempts at load balancing. - #5 by Beaux

The weird thing is, using fly proxy works fine to connect to the database from my dev PC. So it seems the database app itself is fine, it’s just the proxy/networking between the app and the outside world seems to be broken.

I can also connect via ssh to my database app using fly ssh console --app my_database_app, but sshing to my main app gives error connecting to SSH server: connect tcp ... operation timed out.

Over ssh I ran pg_isready --host=localhost, and it says the database itself is fine and ready to accept connections.

binajmen · September 12, 2023, 7:15pm

Hi @DAlperin, Hi @jerome,

I noticed this post, and particularly this comment:

Could it be related?

jerome · September 12, 2023, 7:38pm

The host where your machine is hosted is having issues.

It should show in your Fly dashboard.

binajmen · September 12, 2023, 7:52pm

Hi @jerome,

Mmmh I was checking https://status.flyio.net/ and saw (and still see) nothing. Hence my question!

I didn’t notice this one in my dashboard. My bad!

Screenshot 2023-09-12 at 21.50.00

The service interruption began yesterday. Are there any updates on that? What would be an alternative to mitigate this issue?

Thank you.

jerome · September 12, 2023, 11:17pm

The unfortunate answer to that is to have at least some redundancy (like a cluster of 2 machines for your postgres). That’s not a satisfying answer, especially after the fact.

Our upstream provider has identified this particular host suffers from a bad case of packet loss. Somehow we didn’t catch it earlier, we’ll be modifying our alerts to fix that.

The packet loss does not happen from all hosts, it’s a weird one. Likely why you’ve been able to access it via the internal network.

I’m trying to think of a temporary solution until the host is fixed, let me get some help from others.

jerome · September 13, 2023, 12:00am

@binajmen I think things are better now? We had to reboot the server and your database wasn’t starting anymore (we’re still investigating how it got into this state, but we were able to start it).

As far as I can tell, the host is now reachable from every other host. So your app should be able to communicate with the database.

binajmen · September 13, 2023, 8:18am

Thank you @jerome, I can confirm it is working again

I will do that. Is there a guide on how to do it properly? I suppose I should choose a different region?

Beaux · September 13, 2023, 9:30am

It seems to have been fixed on my side as well. Thanks

@binajmen At first I didn’t understand how scaling would fix this issue, because I tried fly scale count 2 and it would still fail to connect. However I just found out you can scale your app across different regions like this:

fly scale count 3 --region yyz,ewr

That would’ve probably prevented this issue.

jerome · September 13, 2023, 12:09pm

You should choose close-by regions. Or using the same region, volumes will be put in different “zones” by default (or manually: fly volumes create --require-unique-zones).

system · September 20, 2023, 12:09pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Can't reach database server - worked at first Questions / Help postgres	6	2410	July 6, 2023
postgres database not reachable	4	231	August 17, 2023
My postgres database is not working Questions / Help postgres	2	749	February 13, 2024
Flycast for postgres Fresh Produce postgres	15	2560	September 10, 2023
Can't establish connection to postgres database Questions / Help postgres	5	1097	July 24, 2022

Can't reach database #ams #flycast

Related topics