I’ve successfully deployed my Python app with a PostgreSQL cluster. The app manages the DB connection using SQLAlchemy with the asyncpg driver.
However, I regularly get a server error because the connection between the app and PostgreSQL has been unexpectedly closed. It usually happens after a few minutes (~
15-30 minutes) of inactivity. Here is the error in details:
InterfaceError: connection is closed
File "sqlalchemy/dialects/postgresql/asyncpg.py", line 709, in _start_transaction
self._transaction = self._connection.transaction(
File "asyncpg/connection.py", line 275, in transaction
self._check_open()
File "asyncpg/connection.py", line 1400, in _check_open
raise exceptions.InterfaceError('connection is closed')
Under the hood, SQLAlchemy maintains a pool of connections and grab one of them when it needs to perform queries. So my guess is that Fly.io cuts the connection “hardly” after a moment, which provokes the error when we try to reuse it after.
Am I correct to assume this has to do with a “long-lived connection” problem?
Is there a documented time limit for a connection to remain open on Fly? If so, it would help me to configure SQLAlchemy so that it recycles connection after this delay.
My app is fief-server and my PostgreSQL cluster is fief-server-db.
I have not used SQLAlchemy before, but you could try setting keepalive_idle. Our proxy will kill idle client connections after 30 minutes or so, so that’s probably what you’re running into.
Thank you for the suggestion! Unfortunately, it seems that asyncpg doesn’t handle this parameter.
After digging up a bit, I found that there is a similar parameter server-side, tcp_keepalives_idle. If I’m looking at the value on the PostgreSQL server, it’s set at 7200 seconds (the default):
postgres=# SELECT name, setting, unit FROM pg_settings WHERE name='tcp_keepalives_idle';
name | setting | unit
---------------------+---------+------
tcp_keepalives_idle | 7200 | s
From what I understand, PostgreSQL allows the client to override some of its settings. So, this parameter can be set in the client code through the server_settings parameter. Like this:
I’ve set it to 10 minutes (600 seconds)… But this doesn’t solve the issue. I’m not 100% sure the parameter is actually taken into account by PostgreSQL.
I also tried to set it directly on the cluster configuration thanks to the guidance found in this topic, by using the stolonctl tool. Unfortunately, it doesn’t solve the issue neither
I’ve set it at 30 minutes (1800 seconds) to match the limit of the proxy and it seems to solve the issue.
However, it sounds a bit sub-optimal as we may not be able to keep a connection live for more than 30 minutes. It should do it for now, but I would be curious to explore other alternatives if someone has ideas.
I’ve set it to 10 minutes (600 seconds)… But this doesn’t solve the issue. I’m not 100% sure the parameter is actually taken into account by PostgreSQL.
Yeah, in this case, it’s Stolon who manages the PG configuration. Any PG settings that are set directly will be overwritten by Stolon. In any case, you can tweak any of the server-side settings using the following strategy:
Note: We are in the process of making this easier.
# SSH into your Postgres app.
fly ssh console --app <app-name>
# Export the some stolonctl specific environment variables
export $(cat /data/.env | xargs)
# Update stolon spec
stolonctl update --patch '{"pgParameters": { "tcp_keepalives_idle": "<seconds>"}}'
Yes, that’s what I did ultimately. The parameter was correctly set, but it didn’t solve the issue. Maybe the OS doesn’t send the TCP message or the proxy ignores it, I’m not sure how I could debug that.
Hi! Sorry to hijack this thread, but I believe I’m having the same issue, with a slightly different tech set up. I’m running the good_job for background processing on a Rails application, and I’m having a hard connection cut off from the server every 30 minutes too.
PG::ConnectionBad: PQconsumeInput() server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
ActiveRecord’s configuration will by default set an idle_timeout of 5 minutes("number of seconds that a connection will be kept unused in the pool before it is automatically disconnected (default 300 seconds). Set this to zero to keep connections forever."ActiveRecord::ConnectionAdapters::ConnectionPool), which is set in this app: