SQLite concurrency and why you should care about it

1. mickeyp ◴[01 Nov 25 13:48 UTC] No.45781619[source]▶

SQLite is a cracking database -- I love it -- that is let down by its awful defaults in service of 'backwards compatibility.'

You need a brace of PRAGMAs to get it to behave reasonably sanely if you do anything serious with it.

replies(2): >>45781956 #>>45781998 #

2. tejinderss ◴[01 Nov 25 14:38 UTC] No.45781956[source]▶

>>45781619 (TP) #

Do you know any good default PRAGMAs that one should enable?

replies(3): >>45782017 #>>45782031 #>>45782813 #

3. mkoubaa ◴[01 Nov 25 14:44 UTC] No.45781998[source]▶

>>45781619 (TP) #

Seems like it's asking to be forked

replies(3): >>45782143 #>>45782341 #>>45783433 #

4. mickeyp ◴[01 Nov 25 14:46 UTC] No.45782017[source]▶

>>45781956 #

These are my PRAGMAs and not your PRAGMAs. Be very careful about blindly copying something that may or may not match your needs.

    PRAGMA foreign_keys=ON
    PRAGMA recursive_triggers=ON
    PRAGMA journal_mode=WAL
    PRAGMA busy_timeout=30000
    PRAGMA synchronous=NORMAL
    PRAGMA cache_size=10000
    PRAGMA temp_store=MEMORY
    PRAGMA wal_autocheckpoint=1000
    PRAGMA optimize <- run on tx start

Note that I do not use auto_vacuum for DELETEs are uncommon in my workflows and I am fine with the trade-off and if I do need it I can always PRAGMA it.

defer_foreign_keys is useful if you understand the pros and cons of enabling it.

replies(3): >>45782120 #>>45782365 #>>45782487 #

5. leetrout ◴[01 Nov 25 14:48 UTC] No.45782031[source]▶

>>45781956 #

Explanation of sqlite performance PRAGMAs

https://kerkour.com/sqlite-for-servers

6. adzm ◴[01 Nov 25 14:56 UTC] No.45782120{3}[source]▶

>>45782017 #

Really, no mmap?

replies(1): >>45782274 #

7. justin66 ◴[01 Nov 25 14:58 UTC] No.45782143[source]▶

>>45781998 #

It has been forked at least once:

https://docs.turso.tech/libsql

replies(1): >>45790067 #

8. metrix ◴[01 Nov 25 15:12 UTC] No.45782274{4}[source]▶

>>45782120 #

I'm curious what your suggest mmap pragma would be.

replies(1): >>45793234 #

9. kbolino ◴[01 Nov 25 15:19 UTC] No.45782341[source]▶

>>45781998 #

SQLite is fairly fork-resistant due to much of its test suite being proprietary: https://www.sqlite.org/testing.html

10. mikeocool ◴[01 Nov 25 15:22 UTC] No.45782365{3}[source]▶

>>45782017 #

Using strict tables is also a good thing to do, if you value your sanity.

11. porridgeraisin ◴[01 Nov 25 15:35 UTC] No.45782487{3}[source]▶

>>45782017 #

You should pragna optimize before TX end, not at tx start.

Except for long lived connections where you do it periodically.

https://www.sqlite.org/lang_analyze.html#periodically_run_pr...

replies(1): >>45782976 #

12. e2le ◴[01 Nov 25 16:11 UTC] No.45782813[source]▶

>>45781956 #

Although not what you asked for, the SQLite authors maintain a list of recommended compilation options that should be used where applicable.

https://sqlite.org/compile.html#recommended_compile_time_opt...

13. masklinn ◴[01 Nov 25 16:29 UTC] No.45782976{4}[source]▶

>>45782487 #

Also foreign_keys has to be set per connection but journal_mode is sticky (it changes the database itself).

replies(1): >>45783383 #

14. porridgeraisin ◴[01 Nov 25 17:14 UTC] No.45783383{5}[source]▶

>>45782976 #

Yes, if journal_mode was not sticky, a new process opening the db would not know to look for the wal and shm files and read the unflushed latest data from there. On the other hand, foreign key enforcement has nothing to do with the file itself, it's a transaction level thing.

In any case, there is no harm in setting sticky pragmas every connection.

15. pstuart ◴[01 Nov 25 17:20 UTC] No.45783433[source]▶

>>45781998 #

The real fork is DuckDB in a way, it has SQLite compatibility and so much more.

The SQLite team also has 2 branches that address concurrency that may someday merge to trunk, but by their very nature they are quite conservative and it may never happen unless they feel it passes muster.

https://www.sqlite.org/src/doc/begin-concurrent/doc/begin_co... https://sqlite.org/hctree/doc/hctree/doc/hctree/index.html

As to the problem that prompted the article, there's another way of addressing the problem that is kind of a kludge but is guaranteed to work in scenarios like theirs: Have each thread in the parallel scan write to it's own temporary database and then bulk import them once the scan is done.

It's easy to get hung up on having "a database" but sharding to different files by use is trivial to do.

Another thing to bear in mind with a lot of SQLite use cases is that the data is effectively read only save for occasional updates. Read only databases are a lot easier to deal with regarding locking.

replies(3): >>45783708 #>>45783826 #>>45790073 #

16. jitl ◴[01 Nov 25 17:51 UTC] No.45783708{3}[source]▶

>>45783433 #

DuckDB is similar as an in process SQL database, but lacking btree-style ordered indexes makes it a poor performer in key lookups and order-by / range scans if your table is any size larger than trivial.

It’s the classic OLAP (DuckDB) vs OLTP (SQLite) trade off between the two. DuckDB is very good at many things but most applications that need a traditional SQL DB will probably not perform well if you swap it over to DuckDB.

replies(2): >>45783833 #>>45784046 #

17. Kinrany ◴[01 Nov 25 18:02 UTC] No.45783826{3}[source]▶

>>45783433 #

> Read only databases are a lot easier to deal with regarding locking.

"A lot easier" sounds like an understatement. What's there to lock when the data is read only?

18. Kinrany ◴[01 Nov 25 18:03 UTC] No.45783833{4}[source]▶

>>45783708 #

That's surprising, surely OLAP use cases also need key lookups?

19. geysersam ◴[01 Nov 25 18:29 UTC] No.45784046{4}[source]▶

>>45783708 #

Duckdb has optional adaptive radix tree indexing (https://duckdb.org/docs/stable/sql/indexes.html)

replies(1): >>45786034 #

20. jitl ◴[01 Nov 25 22:33 UTC] No.45786034{5}[source]▶

>>45784046 #

Oops, I stand corrected!

What I remember about our evaluation of DuckDB in 2024 concluded that (1) the major limitations were lack of range-scan and index-lookup performance (maybe w/ joins? or update where?), and (2) the DuckDB Node.js module segfaulted too much. Perhaps the engineers somehow missed the ART index it could also be the restriction that data fit in memory to create an index on it (our test dataset was about 50gb)

21. fulafel ◴[02 Nov 25 13:14 UTC] No.45790067{3}[source]▶

>>45782143 #

How are the defaults there?

replies(1): >>45792348 #

22. ◴[02 Nov 25 13:15 UTC] No.45790073{3}[source]▶

>>45783433 #

23. justin66 ◴[02 Nov 25 18:33 UTC] No.45792348{4}[source]▶

>>45790067 #

The default is, don't use it.

24. adzm ◴[02 Nov 25 20:40 UTC] No.45793234{5}[source]▶

>>45782274 #

PRAGMA mmap_size=268435456;

for example? I'm surprised by the downvotes. Using mmap significantly reduced my average read query time; durations about 70% the length!