SQLite async connection pool for high-performance

(github.com)

197 points slaily | 1 comments | 11 Jul 25 10:26 UTC | HN request time: 0s | source

Show context

d1l ◴[14 Jul 25 21:08 UTC] No.44565326[source]▶

This is strange on so many levels.

SQLite does not even do network I/O.

How does sharing a connection (and transaction scope) in an asyncio environment even work? Won’t you still need a connection per asyncio context?

Does sqlite_open really take long compared to the inevitable contention for the write lock you’ll see when you have many concurrent contexts?

Does sqlite_open even register in comparison with the overhead of the python interpreter?

What is an asyncio SQLite connection anyways? Isn’t it just a regular one that gets hucked into a separate thread?

replies(4): >>44565652 #>>44568010 #>>44570930 #>>44572931 #

simonw ◴[14 Jul 25 21:37 UTC] No.44565652[source]▶

>>44565326 #

If you're talking to a 100KB SQLite database file this kind of thing is likely unnecessary, just opening and closing a connection for each query is probably fine.

If you're querying a multi-GB SQLite database there are things like per-connection caches that may benefit from a connection pool.

> What is an asyncio SQLite connection anyways? Isn’t it just a regular one that gets hucked into a separate thread?

Basically yes - aiosqlite works by opening each connection in a dedicated thread and then sending async queries to it and waiting for a response that gets sent to a Future.

https://github.com/omnilib/aiosqlite/blob/895fd9183b43cecce8...

replies(2): >>44565664 #>>44565791 #

crazygringo ◴[14 Jul 25 21:40 UTC] No.44565664[source]▶

>>44565652 #

> If you're querying a multi-GB SQLite database

In which case SQLite is probably the wrong tool for the job, and you should be using Postgres or MySQL that is actually designed from the ground up for lots of concurrent connections.

SQLite is amazing. I love SQLite. But I love it for single-user single-machine scenarios. Not multi-user. Not over a network.

replies(4): >>44565686 #>>44566048 #>>44567163 #>>44570593 #

naasking ◴[14 Jul 25 22:30 UTC] No.44566048[source]▶

>>44565664 #

> In which case SQLite is probably the wrong tool for the job

Why? If all it's missing is an async connection pool to make it a good tool for more jobs, what's the problem with just creating one?

replies(1): >>44567673 #

nomel ◴[15 Jul 25 03:49 UTC] No.44567673[source]▶

>>44566048 #

It's a bit re-inventing the wheel, since solving all the problems that come with network access is precisely why those databases exist, and what they've already done.

asyncpg is a nice python library for postgres.

I think postgres releasing a nice linkable, "serverless" library would be pretty amazing, to make the need for abusing sqlite like this (I do it too) go away.

replies(2): >>44568053 #>>44570654 #

jitl ◴[15 Jul 25 12:56 UTC] No.44570654[source]▶

>>44567673 #

Postgres has really not solved problems that come with being a networked server and will collapse under concurrent connections far before you start to feel it with SQLite. 5000 concurrent connections will already start to deadlock your Postgres server; each new connection in Postgres is a new Postgres process and the state for the connection needs to be written to various internal tracking tables. It has a huge amount of overhead; connection pooling in PG is required and often the total system has a rather low fixed limit compared to idk, writing 200 lines of python code or whatever and getting orders of magnitude more connections out of a single machine.

replies(1): >>44571420 #

anarazel ◴[15 Jul 25 14:18 UTC] No.44571420{3}[source]▶

>>44570654 #

A connection definitely has overhead in PG, but "5000 concurrent connections will already start to deadlock your Postgres server" is bogus. People completely routinely run with more connections.

Check the throughput graphs from this blog post from 2020 (for improvements I made to connection scalability):

https://techcommunity.microsoft.com/blog/adforpostgresql/imp...

That's for read-mostly work. If you do write very intensely, you're going to see more contention earlier. But that's way way worse with sqlite, due to its single writer model.

EDIT: Corrected year.

replies(1): >>44573883 #

1. jitl ◴[15 Jul 25 17:47 UTC] No.44573883{4}[source]▶

>>44571420 #

Yeah, I think I'm conflating our fear of >5000 connections for our Postgres workload (read-write that is quite write heavy) with our SQLite workload, which is 99.9% read.

The way our SQLite workload works is that we have a pool of hundreds of read connections per DB file, and a single writer thread per DB file that keeps the DB up to date via CDC from Postgres; basically using SQLite as a secondary index "scale out" over data primarily written to Postgres. Because we're piping Postgres replication slot -> SQLite, we don't suffer any writer concurrency and throughput is fine to keep up with the change rate so far. Our biggest bottleneck is reading the replication slot on the Postgres side into Kafka with Debezium.

↑