PostgreSQL "Connection is closed" error after a few minutes of activity

fvoron · April 18, 2022, 7:20am

Hello there!

I’ve successfully deployed my Python app with a PostgreSQL cluster. The app manages the DB connection using SQLAlchemy with the asyncpg driver.

However, I regularly get a server error because the connection between the app and PostgreSQL has been unexpectedly closed. It usually happens after a few minutes (~
15-30 minutes) of inactivity. Here is the error in details:

InterfaceError: connection is closed
  File "sqlalchemy/dialects/postgresql/asyncpg.py", line 709, in _start_transaction
    self._transaction = self._connection.transaction(
  File "asyncpg/connection.py", line 275, in transaction
    self._check_open()
  File "asyncpg/connection.py", line 1400, in _check_open
    raise exceptions.InterfaceError('connection is closed')

Under the hood, SQLAlchemy maintains a pool of connections and grab one of them when it needs to perform queries. So my guess is that Fly.io cuts the connection “hardly” after a moment, which provokes the error when we try to reuse it after.

Am I correct to assume this has to do with a “long-lived connection” problem?
Is there a documented time limit for a connection to remain open on Fly? If so, it would help me to configure SQLAlchemy so that it recycles connection after this delay.

My app is fief-server and my PostgreSQL cluster is fief-server-db.

Thank you for your help

Best!

shaun · April 18, 2022, 5:03pm

@fvoron Hey there,

I have not used SQLAlchemy before, but you could try setting keepalive_idle. Our proxy will kill idle client connections after 30 minutes or so, so that’s probably what you’re running into.

Documentation:

fvoron · April 19, 2022, 7:49am

Hi @shaun!

Thank you for the suggestion! Unfortunately, it seems that asyncpg doesn’t handle this parameter.

After digging up a bit, I found that there is a similar parameter server-side, tcp_keepalives_idle. If I’m looking at the value on the PostgreSQL server, it’s set at 7200 seconds (the default):

postgres=# SELECT name, setting, unit FROM pg_settings WHERE name='tcp_keepalives_idle';
        name         | setting | unit 
---------------------+---------+------
 tcp_keepalives_idle | 7200    | s

From what I understand, PostgreSQL allows the client to override some of its settings. So, this parameter can be set in the client code through the server_settings parameter. Like this:

engine = create_async_engine(
    database_url,
    connect_args={"server_settings": {"tcp_keepalives_idle": "600"}},
)

This is not very well documented in SQLAlchemy, here is the only reference that gave me an hint about this: PostgreSQL — SQLAlchemy 1.4 Documentation

I’ve set it to 10 minutes (600 seconds)… But this doesn’t solve the issue. I’m not 100% sure the parameter is actually taken into account by PostgreSQL.

I also tried to set it directly on the cluster configuration thanks to the guidance found in this topic, by using the stolonctl tool. Unfortunately, it doesn’t solve the issue neither

fvoron · April 19, 2022, 11:17am

I’ve tried something else: use the pool_recycle argument of SQLAlchemy. Basically, it discards connections that are older than a defined threshold.

engine = create_async_engine(
    database_url, pool_recycle=1800
)

I’ve set it at 30 minutes (1800 seconds) to match the limit of the proxy and it seems to solve the issue.

However, it sounds a bit sub-optimal as we may not be able to keep a connection live for more than 30 minutes. It should do it for now, but I would be curious to explore other alternatives if someone has ideas.

shaun · April 19, 2022, 12:24pm

I’ve set it to 10 minutes (600 seconds)… But this doesn’t solve the issue. I’m not 100% sure the parameter is actually taken into account by PostgreSQL.

Yeah, in this case, it’s Stolon who manages the PG configuration. Any PG settings that are set directly will be overwritten by Stolon. In any case, you can tweak any of the server-side settings using the following strategy:

Note: We are in the process of making this easier.

# SSH into your Postgres app.  
fly ssh console --app <app-name>

# Export the some stolonctl specific environment variables
export $(cat /data/.env | xargs)

# Update stolon spec
stolonctl update --patch '{"pgParameters": { "tcp_keepalives_idle": "<seconds>"}}'

fvoron · April 19, 2022, 1:02pm

Yes, that’s what I did ultimately. The parameter was correctly set, but it didn’t solve the issue. Maybe the OS doesn’t send the TCP message or the proxy ignores it, I’m not sure how I could debug that.

bbonamin · April 25, 2022, 7:45pm

Hi! Sorry to hijack this thread, but I believe I’m having the same issue, with a slightly different tech set up. I’m running the good_job for background processing on a Rails application, and I’m having a hard connection cut off from the server every 30 minutes too.

PG::ConnectionBad: PQconsumeInput() server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

I reported it to the gem issue tracker (ActiveRecord::StatementInvalid: PG::ConnectionBad: PQsocket() can't get socket descriptor every 30 minutes aprox. · Issue #579 · bensheldon/good_job · GitHub) but it seems that this problem would be on the server side from what I’m reading here! We’re seeing this exactly every 30 minutes or so

ActiveRecord’s configuration will by default set an idle_timeout of 5 minutes("number of seconds that a connection will be kept unused in the pool before it is automatically disconnected (default 300 seconds). Set this to zero to keep connections forever." ActiveRecord::ConnectionAdapters::ConnectionPool), which is set in this app:

irb(main):006:0> ActiveRecord::Base.connection_pool.db_config.idle_timeout
=> 300.0

but I’m still getting this error message every 30 mins or so. Is there any other setting in postgres that I could tweak to fix this?

fvoron · April 26, 2022, 7:28am

In my case, the equivalent parameter, pool_recycle solved the issue by making the client proactively close the connection before the proxy does.

I’m not a Ruby expert but a first step could be to make sure that this parameter is correctly taken into account by your worker.

escapedcat · March 30, 2023, 1:53pm

Also stumbled across this issue. This seemed to have helped:

I’m only running a db command once a day.

Topic		Replies	Views
psycopg2 OperationalError trouble \| Python/Flask/Psycopg2 Questions / Help postgres	2	3846	November 1, 2022
Postgres dropping connections postgres	2	909	June 23, 2023
PostgreSQL connection getting closed Questions / Help postgres	1	1397	August 28, 2022
psycopg2.OperationalError: server closed the connection unexpectedly postgres	3	6122	December 28, 2022
Remix + Postgres "Server has closed the connection" JavaScript postgres	7	66	March 27, 2025

PostgreSQL "Connection is closed" error after a few minutes of activity

Related topics