SQLite async connection pool for high-performance

1. slaily ◴[11 Jul 25 10:26 UTC] No.44530519[source]▶

If you’re building Python async apps (FastAPI, background jobs, etc.) with SQLite, you’ll eventually hit two issues

- Opening/closing connections is fast, but not free—overhead adds up under load

- SQLite writes are globally locked

aiosqlitepool is a tiny library that adds connection pooling for any asyncio SQLite driver (like aiosqlite):

- It avoids repeated database connection setup (syscalls, memory allocation) and teardown (syscalls, deallocation) by reusing long-lived connections

- Long-lived connections keep SQLite's in-memory page cache "hot." This serves frequently requested data directly from memory, speeding up repetitive queries and reducing I/O operations

- Allows your application to process significantly more database queries per second under heavy load

Enjoy!

replies(5): >>44565059 #>>44565071 #>>44566400 #>>44570162 #>>44570945 #

2. slashdev ◴[14 Jul 25 20:47 UTC] No.44565059[source]▶

>>44530519 (TP) #

How does this help with the second issue, the write locks?

replies(1): >>44565309 #

3. mostlysimilar ◴[14 Jul 25 20:48 UTC] No.44565071[source]▶

>>44530519 (TP) #

Around what amount of load would you say the overhead of opening/closing becomes a problem?

replies(1): >>44570459 #

4. ncruces ◴[14 Jul 25 21:07 UTC] No.44565309[source]▶

>>44565059 #

No idea if it applies, but one way would be to direct all writes (including any transaction that may eventually write) to a single connection.

Then writers queue up, while readers are unimpeded.

replies(1): >>44567014 #

5. manmal ◴[14 Jul 25 23:17 UTC] No.44566400[source]▶

>>44530519 (TP) #

Doesn’t SQLite have its own in-memory cache? Is this about having more control re cache size?

replies(1): >>44567018 #

6. dathinab ◴[15 Jul 25 01:25 UTC] No.44567014{3}[source]▶

>>44565309 #

if you enable WAL mode with sqlite then readers are not blocked by writer so only writers queue up without needing any special case handling to archive it

(in general you _really_ should use WAL mode if using sqlite concurrently, you also should read the documentation about WAL mode tho)

replies(3): >>44568037 #>>44568286 #>>44570395 #

7. dathinab ◴[15 Jul 25 01:26 UTC] No.44567018[source]▶

>>44566400 #

yes, per "open connection", hence why not closing+reopening connections all the time helps the cache ;)

8. ncruces ◴[15 Jul 25 05:19 UTC] No.44568037{4}[source]▶

>>44567014 #

Writers won't queue up, rather they'll storm the place, taking turns at asking “can I go now” and sleeping for (tens, hundreds of) milliseconds at a time.

This only gets “worse” as computers get faster: imagine how many write transactions a serial writer could complete (WAL mode and normal synchronous mode) while all your writers are sleeping after the previous one left, because they didn't line up?

And, if you have a single limited pool, your readers will now be stuck waiting for an available connection too (because they're all taken by sleeping writers).

It's much fairer and more efficient for writers to line up with blocking application locks.

9. rich_sasha ◴[15 Jul 25 06:06 UTC] No.44568286{4}[source]▶

>>44567014 #

I was running into some horrendous issues with WAL, where the WAL file would grow boundlessly, eventually leading to veery slow reads and writes.

It's fixable by periodically forcing the WAL to be truncated, but it took me a lot of time and pain to figure it out.

replies(2): >>44568711 #>>44572720 #

10. normie3000 ◴[15 Jul 25 07:11 UTC] No.44568711{5}[source]▶

>>44568286 #

Interesting, were there any warning signs beyond general query slowdown?

replies(2): >>44569665 #>>44570591 #

11. rich_sasha ◴[15 Jul 25 10:15 UTC] No.44569665{6}[source]▶

>>44568711 #

No warning signs and very little about it on the Internet. Just performance slows to a grind. Also hard to replicate.

If I had a blog, I'd be writing about it.

replies(1): >>44592010 #

12. bootsmann ◴[15 Jul 25 11:47 UTC] No.44570162[source]▶

>>44530519 (TP) #

Is there a significant advantage of the sqlite in-memory page cache over the page cache that's already included with the operating system?

replies(1): >>44570373 #

13. jitl ◴[15 Jul 25 12:20 UTC] No.44570373[source]▶

>>44570162 #

Yes: SQLite needs to inspect the schema when it opens a new connection object and does some O(number of conns) lookups in global state during this process. It’s best to avoid re-doing this work.

14. le-mark ◴[15 Jul 25 12:23 UTC] No.44570395{4}[source]▶

>>44567014 #

WAL doesn’t cure concurrency issues for SQLite. WAL plus single writer, multiple reader threaded is required. It’s blazing fast though.

15. jitl ◴[15 Jul 25 12:30 UTC] No.44570459[source]▶

>>44565071 #

It depends hugely on how you decide to manage the connection objects. If you have a single thread / single core server that only even opens a single connection, then connection open overhead is never a problem even under infinite load.

The two main issues w opening a connection are:

1. There is fixed cost O(database schema) time spent building the connection stuff. Ideally SQLite could use a “zygote” connection that can refresh itself and then get cloned to create a new one, instead of doing this work from scratch every time.

2. There is O(number of connections) time spent looking at a list of file descriptors in global state under a global lock. This one is REALLY BAD if you have >10,000 connections so it was a major motivator for us to do connection pooling at Notion. Ideally SQLite could use a hash table instead of a O(n) linear search for this, or disable it entirely.

Both of these issues are reasons I’m excited about Turso’s SQLite rewrite in Rust - it’s so easy to fix both of these issues in Rust (like a good hash table is 2 LoC to adopt in Rust) whereas in the original C it’s much more involved to safely and correctly fix the issue in a fork.

Furthermore, it would be great to share more of the cache between connections as a kind of “L2 cache”; again tractable and safe to build in Rust but complicated to build in a fork of the original C.

Notion uses a SQLite-backed server for our “Database” product concept that I helped write, we ran in to a lot of these kinds of issues scaling reads. We implemented connection pooling over better-sqlite3 Node module to mitigate these issues. We also use Turso’s existing SQLite C fork “libsql” for some connections since it offers a true async option backed by thread pool under the hood in the node driver, which helps in cases where you can have a bottleneck serializing or deserializing data from “node” layout to “SQLite c” layout or many concurrent writes to different DBs from a single NodeJS process.

16. bawolff ◴[15 Jul 25 12:49 UTC] No.44570591{6}[source]▶

>>44568711 #

I think this is mentioned in the docs https://www.sqlite.org/wal.html

17. gwbas1c ◴[15 Jul 25 13:30 UTC] No.44570945[source]▶

>>44530519 (TP) #

Important word:

> Python

Your repo and the readme.md don't say "python." The title of this post doesn't say "python."

It took me a while to realize that this is for python, as opposed to a general-purpose cache for, say, libsqlite.

replies(2): >>44571253 #>>44571645 #

18. kstrauser ◴[15 Jul 25 14:00 UTC] No.44571253[source]▶

>>44570945 #

The tag at the top of the readme, under the title, shows which Python versions it supports. If it never mentioned Python at all, that would be the tipster.

19. sjsdaiuasgdia ◴[15 Jul 25 14:42 UTC] No.44571645[source]▶

>>44570945 #

Let's see...

There's tags showing what Python versions are supported.

The root dir of the repo contains a 'pyproject.toml' file.

The readme contains installation instructions for pip, poetry, and uv, all of which are Python package managers.

The readme contains example code, all of which is in Python.

The readme references asyncio, a Python module that is included in the standard library for Python 3.

The 'Languages' widget on the page shows 99.2% of the repo's code is in Python.

Every file not in the root dir has a .py extension.

Yeah, I can see why it was so hard to figure out...

replies(1): >>44573529 #

20. dathinab ◴[15 Jul 25 16:16 UTC] No.44572720{5}[source]▶

>>44568286 #

The is why I said read the WAL doc page in a different answer ;)

They do point out the risks here: https://sqlite.org/wal.html#avoiding_excessively_large_wal_f...

sqlites design makes a lot of SQL concurrency synchronization edge cases much simpler as you can rely on the single writer at a time limitation. And it has some grate hidden features for using it as client application state storage. But there are use-cases it's just not very good at and moving from sqlite to other DBs can be tricky (if you ever relied on the exclusive write transaction or the way cells are blobs which can mix data types, even it it was by accident)

replies(1): >>44573110 #

21. rich_sasha ◴[15 Jul 25 16:43 UTC] No.44573110{6}[source]▶

>>44572720 #

I did read it. For whatever reason, automatic checkpoints basically would stop from time to time, and the WAL file would start growing like crazy.

In the end I wrote an external process that forced a checkpoint a few times a day, which worked. I came across other exasperated people in various dark corners of the Internet with the same symptoms.

22. tracker1 ◴[15 Jul 25 17:17 UTC] No.44573529{3}[source]▶

>>44571645 #

I'm mostly with you.. it would still be nice if the title reflected the language limitation/feature.

23. normie3000 ◴[17 Jul 25 11:20 UTC] No.44592010{7}[source]▶

>>44569665 #

And how big was the WAL file getting compared to normal? As someone running SQLite in prod it would be comforting at least to have some heuristics to help detect this situation!