←back to thread

353 points HunOL | 2 comments | | HN request time: 0s | source
Show context
dv35z ◴[] No.45781964[source]
Curious if anyone has strategies on how to perform parallel writes to an SQLite database using Python's `multiprocessing` Pool.

I am using it to loop through a database of 11,000 words, hit an HTTP API for each (ChatGPT) and generate example sentences for the word. I would love to be able to asynchronously launch these API calls and have them come back and update the database row when ready, but not sure how to handle the database getting hit by all these writes from (as I understand it) multiple instances of the same Python program/function.

replies(4): >>45781983 #>>45782285 #>>45782358 #>>45791720 #
1. sethev ◴[] No.45782358[source]
Have you tried it?

What you're describing sounds like it would work fine to me. The blog post is misleading imho - it implies that SQLite doesn't handle concurrency at all. In reality, you can perform a bunch of writes in parallel and SQLite will handle running them one after the other internally. This works across applications and processes, you just need to use SQLite to interact with the database. The blog post is also misleading when it implies that the application has to manage access to the database file in some way.

Yes, it's correct that only one of those writes will execute at a time but it's not like you have to account for that in your code, especially in a batch-style process like you're describing. In your Python code, you'll just update a row and it will look like that happens concurrently with other updates.

I'll bet that your call to ChatGPT will take far longer than updating the row, even accounting for time when the write is waiting for its turn in SQLite.

Use WAL-mode for the best performance (and to reduce SQLITE_BUSY errors).

replies(1): >>45784701 #
2. dv35z ◴[] No.45784701[source]
I haven't tried it yet - async processing (and even using SQLite) is new to me, so I'm trying to figure out solution patterns which work for the now, and also I can continue to invest my knowledge in to solve future problems.

I will look into WAL mode. I am enjoying using SQLite (and aware that its not the solution for everything), and have several upcoming tasks which I'm planning to use async stuff - and yes, trying to find the balance between how to handle those async tasks (Networky HTTP calls being different than running `ffmpeg` locally).