PostgreSQL connection issues have returned

Last summer we were trialing Fly.io with a prototype app, and everything went great except that our app would, after several weeks of being idle, reliably drop its PostgreSQL connection. This would manifest as an error upon visiting the app for the first time after several weeks of not using it, and then the connection would recover after a few minutes, and everything was “fine” again.

Now a year later, we’re considering using Fly.io in production and have deployed our beta app. Unfortunately, the PostgreSQL connection drops are back, but now they’re worse: they occur after just a few hours (update: minutes, actually – see below) and reloading the app in the browser does not reconnect the backend to the PostgreSQL instance. We just get constant no connection to the server errors until we restart the app by hand.

This is disconcerting.

I note that there have been some other recent threads about similar issues with PostgreSQL:

Two of these threads don’t have resolutions, and the one that possibly does (the last one in the list) indicates that perhaps it’s the responsibility of the app to use keepalives somehow? Is that the official response?

In any case, I would appreciate it if someone from Fly.io could look into this. Our app works fine and stays up for days when deployed locally to a PostgreSQL instance running in Docker, so I’m pretty certain the issue is not with our app.

One question: at the moment, our app does not accept connections on IPv6, only on IPv4. Could that be the problem? I know that Fly.io use a lot of IPv6 internally. On the other hand, the app does connect fine initially, so lack of IPv6 doesn’t seem to be an issue just after the app’s been restarted.

As an update, I relaunched our app about an hour ago, tested it for a few minutes (the connection was up), and just now I tried again, and the no connection to server error is already manifesting. :\

Hey - are you connecting to a public IpV4 address on the postgres app? Doing that will send all your traffic through the Fly proxy, which may not be ideal. It should be possible to connect to the database over IPv6. That doesn’t require your app to accept ipv6 connections - only make outgoing connections to Postgres.

In our backend app (the bit that runs on Fly.io), we connect to the PostgreSQL database using the DATABASE_URL environment variable provided by Fly.io via the app’s environment. My understanding is that DATABASE_URL refers to an internal, app-specific hostname that resolves to an IP on the app’s Wireguard network, correct?

@dhess1 Did you fix this, out of interest?

Sort of — this is indeed a timeout due to the fact that Fly.io put HAproxy in front of the PostgreSQL instance, and the HAproxy has a 30m timeout. (This was confirmed via email support with Fly.io)

This means you will either want to a) make sure your PostgreSQL adapter can deal with (or automatically) reconnects, or b) fire some kind of idle timer that runs a query at least once every 30 min, to keep the connection alive.

We will probably implement both, but we haven’t had time to deal with it yet, so for the moment I just have a curl script that runs a simple query every 5 seconds :blush: That does the trick.

Also, note that per Fly.io via email, setting the keepalive parameters on the DATABASE_URL will not suffice, because those are at TCP level and HAproxy apparently won’t consider those when evaluating the upstream connection idle time.