←back to thread

353 points HunOL | 5 comments | | HN request time: 1.069s | source
Show context
dv35z ◴[] No.45781964[source]
Curious if anyone has strategies on how to perform parallel writes to an SQLite database using Python's `multiprocessing` Pool.

I am using it to loop through a database of 11,000 words, hit an HTTP API for each (ChatGPT) and generate example sentences for the word. I would love to be able to asynchronously launch these API calls and have them come back and update the database row when ready, but not sure how to handle the database getting hit by all these writes from (as I understand it) multiple instances of the same Python program/function.

replies(4): >>45781983 #>>45782285 #>>45782358 #>>45791720 #
1. mickeyp ◴[] No.45781983[source]
Edit: disregard. I read it as he'd done it and had contention problems.

You can't. You have a single writer - it's one of the many reasons sqlite is terrible for serious work.

You'll need a multiprocessing Queue and a writer that picks off sentences one by one and commits it.

replies(1): >>45782075 #
2. hruk ◴[] No.45782075[source]
This is just untrue - the naive implementation (make the API call, write a single row to the db) will work fine, as transactions are quite fast on modern hardware.

What do you consider "serious" work? We've served a SaaS product from SQLite (roughly 300-500 queries per second at peak) for several years without much pain. Plus, it's not like PG and MySQL are pain-free, either - they all have their quirks.

replies(1): >>45782552 #
3. mickeyp ◴[] No.45782552[source]
Edit: disregard. I read it as he'd done it and had contention problems.

I mean it's not if he's got lock contention from BUSY signals, now is it, as he implies. Much of his issues will stem from transactions blocking each other; maybe they are long-lived, maybe they are not. And those 3-500 queries --- are they writes or reads? Because reads is not a problem.

replies(1): >>45782763 #
4. hruk ◴[] No.45782763{3}[source]
Roughly 80/20 read to write. On the instance's gp3 EBS volume (which is pretty slow), we've pushed ~700 write transactions per second without much problem.
replies(1): >>45782828 #
5. mickeyp ◴[] No.45782828{4}[source]
For small oltp workloads the locking is not going to be a problem. But stuff that holds the write lock for some measurable fraction of a second even will gum things up real fast. Transactions that need it for many seconds? You'll quickly be dead in the water.