sqlite connection started hanging

jtauber · April 2, 2025, 9:20pm

After running without problems for a year, a site of mine with a read-only sqlite database stopped working. I narrowed down the problem to the connection to sqlite timing out.

Nothing has changed on my side so not sure what the root cause is.

Oddly, it starts working briefly if I ssh into each machine and fire up a Python repl and open a sqlite connection.

But then it stops working the moment that machine is stopped.

halfer · April 2, 2025, 9:55pm

Are you using LiteFS? I suppose with a r/o database, there’d be no point, but equally SQLite doesn’t have a listener in the traditional TCP sense, so I’d wonder what actually would time out.

If you SSH into the machine, does the web app start working, or does SQLite only respond via Repl?

jtauber · April 2, 2025, 10:07pm

As it’s read only, I’m just using a sqlite file in the container.

The web app starts working once I’ve successfully made the connection via the repl. One theory I have there is that the timeout is different for the web request versus the repl over ssh.

But I’m not sure why this suddenly became a problem after a year.

jtauber · April 2, 2025, 10:09pm

Literally it’s the sqlite3.connect function in my Flask app.

If I do the same sqlite3.connect in the repl (or in a standalone Python script run on the ssh console) it works.

halfer · April 2, 2025, 10:11pm

Well, that is most odd. Does it all work if you run it under Docker on your development machine?

I don’t do Python myself, but maybe if you put your connection code here, someone might spot something. Also, do Python db drivers generally have a debug or logging mode that can be tried?

Finally, is your database file sitting on a volume, or is it part of your Docker image?

jtauber · April 2, 2025, 10:37pm

The database file is part of the Docker image.

If I ssh into my fly machine, fire up a Python repl and do

import sqlite3
sqlite3.connect(“my_database.db”)

It takes a long time to respond the first time. Subsequent attempts are fast.

Unfortunately the moment the machine restarts, the problem starts all over again.

And as mentioned, this hasn’t been a problem for the last year and I hadn’t deployed for nine months.

halfer · April 2, 2025, 10:40pm

Righto. To clarify here, do you mean a restart that happens as a result of a (re)deployment?

This is when you’re shelled in, right?

jtauber · April 2, 2025, 11:02pm

Or just the machine stopping after inactivity.

As long as a connection to the database has been made, all subsequent attempts (until machine restart) are fast.

I don’t think the first connection can be successful via the web page since it times out (since today). But if I shell in and either run a standalone script that connects or connect via the repl then all subsequent db connections (including via the web app) are fast.

halfer · April 2, 2025, 11:05pm

OK, mystifying. Good detail on the question now though.

Can you make the project available on GitHub, or a cut-down version of the same? I am wondering if there’s something really small that would make a difference, like a minor upgrade to Python, and it might take someone tinkering to see what the issue is. Of course, it might be that there is a weirdness in Fly that upsets Python in particular.

jtauber · April 2, 2025, 11:09pm

Let me put the relevant files up as a gist.

Note: I just timed running my standalone script (that is just doing a sqlite3.connect) and it takes just over a minute to run the first time so I bet that delay is what’s timing out in the web app.

halfer · April 2, 2025, 11:10pm

Lovely, I might take a peek tomorrow evening

jtauber · April 2, 2025, 11:10pm

Here’s the gist: Dockerfile · GitHub

halfer · April 2, 2025, 11:13pm

Great stuff. Does that Gist need a copy of requirements.txt too?

jtauber · April 3, 2025, 12:51am

No, the requirements.txt file doesn’t have anything relevant to the test case.

jtauber · April 3, 2025, 2:38pm

As the site outage is a huge problem for users, I’ve temporarily turned off auto_stop_machines and set min_machines_running = 1. It will cost a lot more but will hopefully avoid the issue until the root cause is found.

system · April 10, 2025, 2:39pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Build timeout 10 minutes, large sqlite db Build debugging	3	436	May 11, 2022
Trying to setup SQLite with Python script Questions / Help	2	625	March 17, 2023
Do you need a server for your SQLite databases? sqlite , litefs	1	204	August 26, 2023
SQL lite db Questions / Help	6	616	July 12, 2022
SQlite problem while using builder:full Build debugging	11	799	January 2, 2023

sqlite connection started hanging

Related topics