SQLite concurrency and why you should care about it

(jellyfin.org)

1. mangecoeur ◴[01 Nov 25 15:14 UTC] No.45782293[source]▶

Sqlite is a great bit of technology but sometimes I read articles like this and think, maybe they should have used postgres. I you don’t specifically need the “one file portability” aspect of sqlite, or its not embedded (in which case you shouldn’t have concurrency issues), Postgres is easy to get running and solves these problems.

replies(11): >>45782439 #>>45782829 #>>45782906 #>>45782930 #>>45782932 #>>45783524 #>>45784757 #>>45784918 #>>45787275 #>>45788143 #>>45788886 #

2. eduction ◴[01 Nov 25 15:30 UTC] No.45782439[source]▶

>>45782293 (TP) #

100%. I specifically clicked for the “why you should care” and was disappointed I could not find it.

I certainly don’t mind if someone is pushing the limits of what SQLite is designed for but personally I’d just rather invest the (rather small) overhead of setting up a db server if I need a lot of concurrency.

3. abound ◴[01 Nov 25 16:12 UTC] No.45782829[source]▶

>>45782293 (TP) #

Jellyfin is a self-hostable media server. If they "used Postgres", that means anyone who runs it needs Postgres. I think SQLite is the better choice for this kind of application, if one is going to choose a single database instead of some pluggable layer

replies(4): >>45783314 #>>45785817 #>>45786226 #>>45788638 #

4. bambax ◴[01 Nov 25 16:21 UTC] No.45782906[source]▶

>>45782293 (TP) #

Jellyfin is a media server app that gets installed on a great variety of platforms and while it would certainly be possible to add a postgres server to the install, the choice of sqlite is more than justified here IMHO.

5. throwaway894345 ◴[01 Nov 25 16:24 UTC] No.45782930[source]▶

>>45782293 (TP) #

As a user of Jellyfin, I’m very sad that it doesn’t just use Postgres. I basically have to run an NFS system just for Jellyfin so that its data can be available to it no matter which node it gets scheduled on and also that there are never multiple instances running at the same time, even during deployments (e.g., I need to take care that deployments completely stop the first Jellyfin instance before starting the subsequent instance). There are so many unnecessary single points of failure, and Postgres would make a pretty big one go away (never mind addressing the parallelism problems that plague the developers).

Jellyfin is by far the least reliable application I run, but it also seems to be best in class.

replies(3): >>45784370 #>>45786762 #>>45789135 #

6. thayne ◴[01 Nov 25 16:24 UTC] No.45782932[source]▶

>>45782293 (TP) #

Using postgres would make it significantly more complicated for Jellyfin users to install and set up Jellyfin. And then users would need to worry about migrating the databases when PostgreSQL has a major version upgrade. An embedded database like sqlite is a much better fit for something like Jellyfin.

replies(1): >>45783048 #

7. throwaway894345 ◴[01 Nov 25 16:36 UTC] No.45783048[source]▶

>>45782932 #

As a Jellyfin user, this hasn’t been my experience. I needed to do a fair bit of work to make sure Jellyfin could access its database no matter which node it was scheduled onto and that no more than one instance ever accessed the database at the same time. Jellyfin by far required more work to setup maintainably than any of the other applications I run, and it is also easily the least reliable application. This isn’t all down to SQLite, but it’s all down to a similar set of assumptions (exactly one application instance interacting with state over a filesystem interface).

replies(4): >>45784027 #>>45784408 #>>45785008 #>>45788178 #

8. morshu9001 ◴[01 Nov 25 17:06 UTC] No.45783314[source]▶

>>45782829 #

Exactly, there are use cases where SQLite makes sense but you also want to make it faster. I really don't get why there isn't a more portable Postgres.

replies(1): >>45785432 #

9. amaccuish ◴[01 Nov 25 17:32 UTC] No.45783524[source]▶

>>45782293 (TP) #

Their whole recent rewrite of the DB code (to Entity Framework) is to allow the user choice of DB in future.

10. thayne ◴[01 Nov 25 18:26 UTC] No.45784027{3}[source]▶

>>45783048 #

Is running multiple nodes a typical way to run Jellyfin through? I would expect that most Jellyfin users only run a single instance at a time.

replies(1): >>45784622 #

11. KingMob ◴[01 Nov 25 19:04 UTC] No.45784370[source]▶

>>45782930 #

I gave up on Jellyfin after media library updates kept hanging on certain video files, and switched to the original Emby it was forked from (iiuc).

Emby has a scarily-ancient install process, but it's been working just fine with less hassle.

12. stormbeard ◴[01 Nov 25 19:09 UTC] No.45784408{3}[source]▶

>>45783048 #

Jellyfin isn’t meant to be some highly available distributed system, so of course this happens when you try to operate it like one. The typical user is not someone trying to run it via K8s.

replies(1): >>45784611 #

13. throwaway894345 ◴[01 Nov 25 19:35 UTC] No.45784611{4}[source]▶

>>45784408 #

Yeah, I agree, though making software that can run in a distributed configuration is a matter of following a few basic principles, and would be far less work than what the developers have spent chasing down trying to make SQLite work for their application.

The effort required to put an application on Kubernetes is a pretty good indicator of software quality. In other words, I can have a pretty good idea about how difficult a software is to maintain in a single-instance configuration by trying to port it to Kubernetes.

14. throwaway894345 ◴[01 Nov 25 19:36 UTC] No.45784622{4}[source]▶

>>45784027 #

Yes, but you have to go out of your way when writing software to make it so the software can only run on one node at a time. Or rather, well-architected software should require minimal, isolated edits to run in a distributed configuration (for example, replacing SQLite with a distributed SQLite).

replies(1): >>45786629 #

15. petters ◴[01 Nov 25 19:53 UTC] No.45784757[source]▶

>>45782293 (TP) #

Jellyfin is mostly for a single household, right? Sqlite should be much more than sufficient for Jellyfin (if used correctly). Unfortunately, reading this article you get the impression that they are not using it optimally

replies(1): >>45786010 #

16. o11c ◴[01 Nov 25 20:11 UTC] No.45784918[source]▶

>>45782293 (TP) #

Even with postgres, you don't have to use the system instance; there's nothing stopping you from running the server as a child process.

You probably need to support this for your testsuite anyway.

replies(1): >>45785060 #

17. FrinkleFrankle ◴[01 Nov 25 20:24 UTC] No.45785008{3}[source]▶

>>45783048 #

Care to share your setup?

18. hamandcheese ◴[01 Nov 25 20:31 UTC] No.45785060[source]▶

>>45784918 #

Maybe in theory. In practice, most people who need Postgres for their test suite will boot an instance in a docker container in CI, and maybe just assume a system version is available for local dev.

19. zie ◴[01 Nov 25 21:13 UTC] No.45785432{3}[source]▶

>>45783314 #

There is, you can even run PG under wasm if you are desperate. :)

SQLite is probably the better option here and in most places where you want portability though.

20. tombert ◴[01 Nov 25 22:01 UTC] No.45785817[source]▶

>>45782829 #

I share my Jellyfin with about a dozen people, and it's not weird to have several people streaming at the same time. I have a two gigabit connection so bandwidth isn't generally an issue, but I've had issues when three people all streaming a VC-1 encoded video to H264 in software.

This is something that I think I could fairly easily ameliorate if I could simply load-balance the application server by user, but historically (with Emby), I've not been able to do that due to SQLite locking not allowing me to run multiple instances pointing to the same config instance.

There's almost certainly ways to do this correctly with SQLite but if they allowed for using almost literally any other database this would be a total non-issue.

ETA:

For clarification if anyone is reading this, all this media LEGALLY OBTAINED with PERMISSION FROM THE COPYRIGHT HOLDER(S).

replies(3): >>45786203 #>>45789067 #>>45790482 #

21. nick_ ◴[01 Nov 25 22:29 UTC] No.45786010[source]▶

>>45784757 #

Agreed. How can a media file sharing app possibly saturate Sqlite's write limit? I would use an app-level global lock on all writes to Sqlite.

replies(1): >>45788643 #

22. reddalo ◴[01 Nov 25 22:54 UTC] No.45786203{3}[source]▶

>>45785817 #

Yeah, I'm sure those twelve people love watching your vacation clips all the time ;)

23. reddalo ◴[01 Nov 25 22:57 UTC] No.45786226[source]▶

>>45782829 #

They're actually planning on migrating to Postgres in a future release:

>[...] it also opens up new possibilities - not officially yet, but soon - for running Jellyfin backed by "real" database systems like PostgreSQL, providing new options for redundancy, load-balancing, and easier maintenance and administration. The future looks very bright!

https://jellyfin.org/posts/jellyfin-release-10.11.0/

replies(1): >>45789094 #

24. thayne ◴[01 Nov 25 23:55 UTC] No.45786629{5}[source]▶

>>45784622 #

That's just not true. Distributed software is much more complicated and difficult than non-distributed software. Distributed systems have many failure modes that you don't have to worry about in non-distributed systems.

Now maybe you could have an abstraction layer over your storage layer that supports multiple data stores, including a distributed one. But that comes with tradeoffs, like being limited to the least common denominator of features of the data stores, and having to implement the abstraction layer for multiple data stores.

replies(1): >>45787727 #

25. ants_everywhere ◴[02 Nov 25 00:18 UTC] No.45786762[source]▶

>>45782930 #

I have the same experience. SQLite has been a source of most Jellyfin problems, and Jellyfin has more problems than the rest of the ~ 150 containers I run regularly.

A stateless design where a stateless jellyfin server talks to a postgres database would be simpler and more robust.

replies(1): >>45787739 #

26. zeroq ◴[02 Nov 25 02:03 UTC] No.45787275[source]▶

>>45782293 (TP) #

Sqlite has so many small benefits for tiny projects it can't be easily replaced.

It's like saying "oh, you want to visit Austrian country side next month and you're asking for advice for best tent? How about you build a cabin instead?".

27. throwaway894345 ◴[02 Nov 25 03:47 UTC] No.45787727{6}[source]▶

>>45786629 #

I’m a distributed systems architect. I design, build, and operate distributed systems.

> Distributed systems have many failure modes that you don't have to worry about in non-distributed systems.

Yes, but as previously mentioned, those failure modes are handled by abiding a few simple principles. It’s also worth noting that multiprocess or multithreaded software have many of the same failure modes, including the one discussed in this post. Architecting systems as though they are distributed largely takes care of those failure modes as well, making even single-node software like Jellyfin more robust.

> Now maybe you could have an abstraction layer over your storage layer that supports multiple data stores, including a distributed one. But that comes with tradeoffs, like being limited to the least common denominator of features of the data stores, and having to implement the abstraction layer for multiple data stores.

Generally I just target storage interfaces that can be easily distributed—things like Postgres (or maybe dqlite?) for SQL databases or an object storage API instead of a filesystem API. If you build a system like it could be distributed one day, you’ll end up with a simpler, more modular system even if you never scale to more than one node (maybe you just want to take advantage of parallelism on your single node, as was the case in this blog post).

replies(1): >>45791978 #

28. throwaway894345 ◴[02 Nov 25 03:52 UTC] No.45787739{3}[source]▶

>>45786762 #

Yeah, honestly I’m kind of thinking about a media server architecture that has a stateless media server that vends links to pre-transcoded media in object storage (which video players would source from), since pretty much anything can handle mp4/h264/acc video. Maybe in the future I could add on some on-the-fly transcoding (which would happen on a dedicated cluster, reading and writing to object storage), but that seems like a pretty big undertaking.

29. heavyset_go ◴[02 Nov 25 06:05 UTC] No.45788143[source]▶

>>45782293 (TP) #

I run Jellyfin in a multi-arch cluster because I hate myself, and this would force me to think about where Jellyfin/Postgres is deployed because Postgres databases aren't portable.

I already had to do that for my authoritative PG deployment, and my media manager shouldn't require a full RDBMS.

Using SQLite for Jellyfin has made running it wherever really, really easy, same thing with doing backups and lazy black box debugging.

30. heavyset_go ◴[02 Nov 25 06:15 UTC] No.45788178{3}[source]▶

>>45783048 #

Jellyfin isn't a Netflix replacement, it's a desktop application that's a web app by necessity. Treat it like a desktop app and you won't have these issues.

replies(1): >>45790397 #

31. npodbielski ◴[02 Nov 25 08:00 UTC] No.45788638[source]▶

>>45782829 #

What is the problem to bundle postgress db engine in the docker server? If you want to install it from package, they can have postgress dB as an option with the warning somewhere that it is 'recomended'. I am sure that if you are able to slefhost stuff you are able to install postgress too.

replies(2): >>45789054 #>>45789852 #

32. npodbielski ◴[02 Nov 25 08:03 UTC] No.45788643{3}[source]▶

>>45786010 #

Probably during scanning libraries? They read hundreds of files and for each of them look for metadata in the internet like discogs and similar. So sure if implemented as async in c# you could run into this issue.

replies(1): >>45793427 #

33. andersmurphy ◴[02 Nov 25 09:03 UTC] No.45788886[source]▶

>>45782293 (TP) #

Sqlite is fine you need to read the extensive documentation though to get the most out of it. It also has terrible defaults.

I think the author od this article missed sqlite_busy.

Once you do have it set up correctly, are handling a single writer at the application level and have litestream set up your off to the races assuming your app can scale on a single box (it most likely can).

34. apitman ◴[02 Nov 25 09:40 UTC] No.45789054{3}[source]▶

>>45788638 #

Jellyfin is one of the very few selfhosted apps that can be run as a simple GUI app on Windows. As an advocate for making selfhosting accessible to less technical people, I'm glad they're using sqlite and also that they don't require docker.

35. apitman ◴[02 Nov 25 09:43 UTC] No.45789067{3}[source]▶

>>45785817 #

Why not encode to H264 or another codec more widely supported by clients? Storage is cheap.

36. apitman ◴[02 Nov 25 09:49 UTC] No.45789094{3}[source]▶

>>45786226 #

I hope they keep sqlite as a first class citizen.

37. apitman ◴[02 Nov 25 09:58 UTC] No.45789135[source]▶

>>45782930 #

You're from the current generation of selfhosters, which culturally is very similar to kit car builders. The next generation of selfhosters/indiehosters just want a car to get from point A to point B. Sqlite is better for those people.

replies(1): >>45790425 #

38. xorcist ◴[02 Nov 25 12:34 UTC] No.45789852{3}[source]▶

>>45788638 #

A database is never hard to install, but it can be tricky to operate.

You have to at least have at least a slight idea about the specifics, from different types of vacuum to how it behaves in low memory conditions. The idea that docker has something to do this is a misdirection at best.

And if you think sqlite has many knobs and special modes, wait until you hear about Postgres.

replies(1): >>45791471 #

39. throwaway894345 ◴[02 Nov 25 13:56 UTC] No.45790397{4}[source]▶

>>45788178 #

They have clients for nearly every device; it’s clearly intended to be a streaming media server.

replies(1): >>45790903 #

40. throwaway894345 ◴[02 Nov 25 14:01 UTC] No.45790425{3}[source]▶

>>45789135 #

That’s a bit of a strange argument considering all the hoops one needs to jump through to make Jellyfin work on account of Sqlite. I just want to run the software I use on the computers I have.

replies(1): >>45790876 #

41. MayeulC ◴[02 Nov 25 14:11 UTC] No.45790482{3}[source]▶

>>45785817 #

> I've had issues when three people all streaming a VC-1 encoded video to H264 in software.

I don't quite get the "in software" part. I assume you mean that the video needs to be transcoded to h.264 on your server for their client to play it.

The way I mostly solved this is to ask people to install and use the native app (jellyfin-media-player or Android app) whenever possible, as it is compatible with more codecs.

You can also configure HW acceleration for transcoding, a decent GPU should have no trouble encoding a few h.264 streams in real time.

And lastly, you can play with distributed versions of ffmpeg, since Jellyfin calls ffmpeg. There are multiple options, such as https://hub.docker.com/r/bitwrk/jellyfin-rffmpeg (I never used it myself, though).

replies(1): >>45793797 #

42. apitman ◴[02 Nov 25 15:13 UTC] No.45790876{4}[source]▶

>>45790425 #

You're having issues because you're trying to shoehorn it into your desired architecture. Most people just want to run an app on their Windows laptop and start streaming their videos.

replies(1): >>45792506 #

43. heavyset_go ◴[02 Nov 25 15:16 UTC] No.45790903{5}[source]▶

>>45790397 #

It's a local media library manager in the same vein as media servers that came before it that were intended to run on desktops and serve up content to consoles and whatever on your LAN back when that was the thing to do.

My point is to treat it like software from that lineage and you won't have a problem, trying to treat it like something it's not, like a distributed web app, will lead to issues.

replies(1): >>45792426 #

44. npodbielski ◴[02 Nov 25 16:31 UTC] No.45791471{4}[source]▶

>>45789852 #

> And if you think sqlite has many knobs and special modes, wait until you hear about Postgres.

And why do you think I think that?

45. thayne ◴[02 Nov 25 17:37 UTC] No.45791978{7}[source]▶

>>45787727 #

> just target storage interfaces that can be easily distributed—things like Postgres

But as I mentioned above, that makes the system more complicated for people who don't need it to be distributed.

Setting up separate db software, configuring the connection, handling separate updates, etc. is a lot more work for most users than Jellyfin just using a local embedded sqlite database. And it would probably make the application code more complicated as well.

replies(1): >>45792475 #

46. throwaway894345 ◴[02 Nov 25 18:45 UTC] No.45792426{6}[source]▶

>>45790903 #

It feels like we’re saying similar things. We both agree that its architecture makes it difficult to run with high availability, although I’ll point out that the issues documented in the article apply to single nodes and even on a single node it has pretty specific hardware requirements. I think we just disagree about whether “you have to hold it very carefully and it works just fine” is a good thing or not.

47. throwaway894345 ◴[02 Nov 25 18:52 UTC] No.45792475{8}[source]▶

>>45791978 #

> But as I mentioned above, that makes the system more complicated for people who don't need it to be distributed. Setting up separate db software, configuring the connection, handling separate updates, etc. is a lot more work for most users than Jellyfin just using a local embedded sqlite database.

You can package a Postgres database with your app just like SQLite. Users should not have to know that they are using Postgres much less configuring connections, handling updates, etc.

> And it would probably make the application code more complicated as well.

Not at all, this is an article about the hoops the application has to jump through to make SQLite behave well with parallel access. Postgres is designed for parallel access by default. It’s strictly simpler from the perspective of the application.

48. throwaway894345 ◴[02 Nov 25 18:55 UTC] No.45792506{5}[source]▶

>>45790876 #

Maybe that’s what most users want, but that’s not what the software was designed to target, judging from all of the documentation and marketing. But yes, clearly the software wasn’t designed to run in a distributed fashion, and that’s kind of the point of my criticism—they had to go out of their way to couple their application in a way that precludes distributed execution. Well designed server software is trivial to distribute, and even if you never run it in a distributed configuration it makes it easy to do the basic parallelism described in this article.

49. nick_ ◴[02 Nov 25 21:03 UTC] No.45793427{4}[source]▶

>>45788643 #

Are you hinting at the lack of an `AsyncLock` in .NET?

50. tombert ◴[02 Nov 25 21:59 UTC] No.45793797{4}[source]▶

>>45790482 #

I mean "in software" in that it's not hardware assisted. I have gotten VAAPI working but it's a bit flaky with some videos for some reason, so I disabled it and just do vanilla ffmpeg.

I'll look into the distributed ffmpeg.

↑