Possible issue with database

Note that we’ve changed a lot about Postgres since that thread. If you’re running a relatively recent Fly Postgres cluster, you won’t get disconnected when Stolon has issues. We switched to haproxy from stolonproxy for this reason.

You can check your Postgres version with:

fly image show
2 Likes

Yeah the image I’m using is recent:

Deployment Status
  Registry   = registry-1.docker.io
  Repository = flyio/postgres
  Tag        = 14.1
  Version    = v0.0.14

I’ll make a dumb Node app and deploy it to see how it behaves.

I made this project and the error is happening when deployed on Fly too.

This is the repo with the Fastify app.

The error:

2022-03-02T02:51:43Z app[b42312ba] mia [info]PG REMOVE
2022-03-02T02:58:36Z app[b42312ba] mia [info]node:events:498
2022-03-02T02:58:36Z app[b42312ba] mia [info]      throw er; // Unhandled 'error' event
2022-03-02T02:58:36Z app[b42312ba] mia [info]      ^
2022-03-02T02:58:36Z app[b42312ba] mia [info]Error: Connection terminated unexpectedly
2022-03-02T02:58:36Z app[b42312ba] mia [info]    at Connection.<anonymous> (/workspace/node_modules/pg/lib/client.js:132:73)
2022-03-02T02:58:36Z app[b42312ba] mia [info]    at Object.onceWrapper (node:events:639:28)
2022-03-02T02:58:36Z app[b42312ba] mia [info]    at Connection.emit (node:events:520:28)
2022-03-02T02:58:36Z app[b42312ba] mia [info]    at Socket.<anonymous> (/workspace/node_modules/pg/lib/connection.js:107:12)
2022-03-02T02:58:36Z app[b42312ba] mia [info]    at Socket.emit (node:events:532:35)
2022-03-02T02:58:36Z app[b42312ba] mia [info]    at endReadableNT (node:internal/streams/readable:1346:12)
2022-03-02T02:58:36Z app[b42312ba] mia [info]    at processTicksAndRejections (node:internal/process/task_queues:83:21)
2022-03-02T02:58:36Z app[b42312ba] mia [info]Emitted 'error' event on Client instance at:
2022-03-02T02:58:36Z app[b42312ba] mia [info]    at Client._handleErrorEvent (/workspace/node_modules/pg/lib/client.js:319:10)
2022-03-02T02:58:36Z app[b42312ba] mia [info]    at Connection.<anonymous> (/workspace/node_modules/pg/lib/client.js:149:16)
2022-03-02T02:58:36Z app[b42312ba] mia [info]    at Object.onceWrapper (node:events:639:28)
2022-03-02T02:58:36Z app[b42312ba] mia [info]    [... lines matching original stack trace ...]
2022-03-02T02:58:36Z app[b42312ba] mia [info]    at processTicksAndRejections (node:internal/process/task_queues:83:21)

There’s a 4 year old issue on the pg repo which seems to suggest this happens on idle connections.

Some user in that issue suggested using a native driver with pg-native. I guess I’ll try that tomorrow.

1 Like

I think I’ve figured it out.

I’ve added a keepAlive setting to the pool and the connection has been rock solid (for now). Not sure why, but this is an undocumented feature of the pg driver.

This led me to find out that PG itself has some keep alive settings that can be configured so that a TCP packet is sent periodically. The settings are explained here:

@kurt I’m guessing you’ve not configured that on your PG image?

Edit:

That didn’t help. It just crashed again after 30 mins of inactivity…

But still maybe the issue is related the keep alive TCP settings of the PG server?

Ok, I think I have finally figured this out.

Turns out, as usual, I’m an idiot :smile:

When using a pg pool you don’t need to do pool.connect(). This method is intended to create a new client which you only need for persistent connection (eg: when listening to notifications).

So when doing await pool.connect() this was creating a leaky client which didn’t have an error handler, hence the problem when the client disconnected for whatever reason.

I’ve updated my repo. I’ve also deployed the app again on Fly just to confirm but I’m pretty certain this is it.

I still need to figure out how to create a persistent connection for listen/notify but this is another issue.

One thing I will mention is that somehow Heroku PG seems more stable than Fly PG as far as connections go. Even with my leaky setup, I left the app running all night connected to Heroku and the client never disconnected. Not sure if this is related to the keep alive PG settings I mentioned above or what.

Edit:

+2 hours and the app is holding on :tada:

@pier Ah … sounds like that may well be it.

Good spot on the keep-alive for PG. Hopefully Fly can enable that.

1 Like

Hi @pier

(I wrote this up yesterday but totally forgot to hit send so looks like you’ve already figured it out, but there’s an extra tidbit about the error handling so I’m replying anyway)

In your repo you are “checking out” a db client and then not doing anything with it (line 39).

What is happening is that the connection terminates somehow and the client is throwing an error. While the db client is “checked out” you are responsible for handling errors, not the pool.

If you instead store the “checked out” db client into a variable
const client = await pg.connect()

and attach an error handler to it
client.on('error', (err) => console.log(err))

then the connection terminated unexpectedly errors shouldn’t cause your app to crash.

1 Like

Thanks for your comment @charsleysa .

Yeah, the confusion for me was not understanding that pool.connect() doesn’t really connect to the db but instead creates a new connected client.

Clients are created and handled automatically when doing pool.query() but an error handler still needs to be added to the pool itself (in case any of the automatically managed clients receives an error). But as you pointed out, when checking out a client manually with pool.connect() the error handler also needs to be added manually.