SQLite concurrency and why you should care about it

(jellyfin.org)

353 points HunOL | 5 comments | 01 Nov 25 12:59 UTC | HN request time: 0.001s | source

Show context

mangecoeur ◴[01 Nov 25 15:14 UTC] No.45782293[source]▶

Sqlite is a great bit of technology but sometimes I read articles like this and think, maybe they should have used postgres. I you don’t specifically need the “one file portability” aspect of sqlite, or its not embedded (in which case you shouldn’t have concurrency issues), Postgres is easy to get running and solves these problems.

replies(11): >>45782439 #>>45782829 #>>45782906 #>>45782930 #>>45782932 #>>45783524 #>>45784757 #>>45784918 #>>45787275 #>>45788143 #>>45788886 #

thayne ◴[01 Nov 25 16:24 UTC] No.45782932[source]▶

>>45782293 #

Using postgres would make it significantly more complicated for Jellyfin users to install and set up Jellyfin. And then users would need to worry about migrating the databases when PostgreSQL has a major version upgrade. An embedded database like sqlite is a much better fit for something like Jellyfin.

replies(1): >>45783048 #

throwaway894345 ◴[01 Nov 25 16:36 UTC] No.45783048[source]▶

>>45782932 #

As a Jellyfin user, this hasn’t been my experience. I needed to do a fair bit of work to make sure Jellyfin could access its database no matter which node it was scheduled onto and that no more than one instance ever accessed the database at the same time. Jellyfin by far required more work to setup maintainably than any of the other applications I run, and it is also easily the least reliable application. This isn’t all down to SQLite, but it’s all down to a similar set of assumptions (exactly one application instance interacting with state over a filesystem interface).

replies(4): >>45784027 #>>45784408 #>>45785008 #>>45788178 #

thayne ◴[01 Nov 25 18:26 UTC] No.45784027[source]▶

>>45783048 #

Is running multiple nodes a typical way to run Jellyfin through? I would expect that most Jellyfin users only run a single instance at a time.

replies(1): >>45784622 #

1. throwaway894345 ◴[01 Nov 25 19:36 UTC] No.45784622[source]▶

>>45784027 #

Yes, but you have to go out of your way when writing software to make it so the software can only run on one node at a time. Or rather, well-architected software should require minimal, isolated edits to run in a distributed configuration (for example, replacing SQLite with a distributed SQLite).

replies(1): >>45786629 #

2. thayne ◴[01 Nov 25 23:55 UTC] No.45786629[source]▶

>>45784622 (TP) #

That's just not true. Distributed software is much more complicated and difficult than non-distributed software. Distributed systems have many failure modes that you don't have to worry about in non-distributed systems.

Now maybe you could have an abstraction layer over your storage layer that supports multiple data stores, including a distributed one. But that comes with tradeoffs, like being limited to the least common denominator of features of the data stores, and having to implement the abstraction layer for multiple data stores.

replies(1): >>45787727 #

3. throwaway894345 ◴[02 Nov 25 03:47 UTC] No.45787727[source]▶

>>45786629 #

I’m a distributed systems architect. I design, build, and operate distributed systems.

> Distributed systems have many failure modes that you don't have to worry about in non-distributed systems.

Yes, but as previously mentioned, those failure modes are handled by abiding a few simple principles. It’s also worth noting that multiprocess or multithreaded software have many of the same failure modes, including the one discussed in this post. Architecting systems as though they are distributed largely takes care of those failure modes as well, making even single-node software like Jellyfin more robust.

> Now maybe you could have an abstraction layer over your storage layer that supports multiple data stores, including a distributed one. But that comes with tradeoffs, like being limited to the least common denominator of features of the data stores, and having to implement the abstraction layer for multiple data stores.

Generally I just target storage interfaces that can be easily distributed—things like Postgres (or maybe dqlite?) for SQL databases or an object storage API instead of a filesystem API. If you build a system like it could be distributed one day, you’ll end up with a simpler, more modular system even if you never scale to more than one node (maybe you just want to take advantage of parallelism on your single node, as was the case in this blog post).

replies(1): >>45791978 #

4. thayne ◴[02 Nov 25 17:37 UTC] No.45791978{3}[source]▶

>>45787727 #

> just target storage interfaces that can be easily distributed—things like Postgres

But as I mentioned above, that makes the system more complicated for people who don't need it to be distributed.

Setting up separate db software, configuring the connection, handling separate updates, etc. is a lot more work for most users than Jellyfin just using a local embedded sqlite database. And it would probably make the application code more complicated as well.

replies(1): >>45792475 #

5. throwaway894345 ◴[02 Nov 25 18:52 UTC] No.45792475{4}[source]▶

>>45791978 #

> But as I mentioned above, that makes the system more complicated for people who don't need it to be distributed. Setting up separate db software, configuring the connection, handling separate updates, etc. is a lot more work for most users than Jellyfin just using a local embedded sqlite database.

You can package a Postgres database with your app just like SQLite. Users should not have to know that they are using Postgres much less configuring connections, handling updates, etc.

> And it would probably make the application code more complicated as well.

Not at all, this is an article about the hoops the application has to jump through to make SQLite behave well with parallel access. Postgres is designed for parallel access by default. It’s strictly simpler from the perspective of the application.

↑