←back to thread

Sqlite3 WebAssembly

(sqlite.org)
567 points whatever3 | 1 comments | | HN request time: 0.196s | source
Show context
simonw ◴[] No.41851934[source]
Something that would be really fun would be to run SQLite in-memory in a browser but use the same tricks as Litestream and Cloudflare Durable Objects (https://simonwillison.net/2024/Oct/13/zero-latency-sqlite-st...) to stream a copy of the WAL log to a server (maybe over a WebSocket, though intermittent fetch() POST would work too).

Then on subsequent visits use that server-side data to rehydrate the client-side database.

From https://sqlite.org/forum/info/50a4bfdb294333eec1ba4749661934... is looks like WAL mode is excluded from the default SQLite WASM build so you would have to go custom with that.

replies(10): >>41852040 #>>41852194 #>>41853089 #>>41854540 #>>41854586 #>>41854654 #>>41855596 #>>41856415 #>>41857000 #>>41858635 #
ncruces ◴[] No.41853089[source]
There are many layers of that's not how it works at play here.

In-memory SQLite databases don't use WAL. Wasm (and browser Wasm, in particular) doesn't support anything like the shared memory APIs SQLite wants for its WAL mode.

Litestream requires a very precise WAL setup to work (which just so happens to work with the default native SQLite setup, but is hard to replicate with Wasm).

Cloudflare Durable Objects may have been inspired by Litestream but works very differently (as do LiteFS, Turso, etc…)

The general idea of streaming changes from SQLite would work, but it's a lot of work, and the concurrency model of in-browser Wasm will make it challenging to implement.

(I wrote that forum post some time ago, and have WAL working in a server side Wasm build of SQLite, but none of the options to make it work would make much sense, or be possible, in browser)

replies(4): >>41853607 #>>41853645 #>>41853823 #>>41857460 #
digdugdirk ◴[] No.41853607[source]
As someone who uses sqlite fairly regularly, but doesn't understand what most of those paragraphs mean, do you have any recommendations for learning resources?

I'm gathering that I need to learn about: - WAL - Shared Memory APIs - Concurrency models - Durable Objects?

replies(2): >>41853788 #>>41855356 #
wyager ◴[] No.41853788[source]
WAL: Write ahead log, common strategy for DBs (sqlite, postgres, etc.) to improve commit performance. Instead of fsync()ing every change, you just fsync() a log file that contains all the changes and then you can fsync() the actual changes at your leisure

Shared memory API: If you want to share (mutable) data between multiple processes, you need some kind of procedure in place to manage that. How do you get a reference to the data to multiple processes, how do you make sure they don't trample each other's writes, etc.

Concurrency model: There are many different ways you can formalize concurrent processes and the way they interact (message passing, locking, memory ordering semantics, etc.). Different platforms will expose different concurrency primitives that may not work the same way as other platforms and may require different reasoning or code structure

Durable objects - I think this is some Cloudflare service where they host data that can be read or modified by your users

This is all from memory, but IME, GPT is pretty good for asking about concepts at this level of abstraction

replies(1): >>41854152 #
digdugdirk ◴[] No.41854152[source]
Thank you!

And side note on your last point - I've been burned too many times by confident hallucinations to trust my foundational learning to GPT. I hope someday that will improve, but for now ChatGPT is as trustworthy as an evening chat with someone at the bar.

... Someone who has been drinking since happy hour.

replies(3): >>41855004 #>>41856177 #>>41857095 #
globular-toast ◴[] No.41856177[source]
If you'd like a trustworthy overview, the book Designing Data-Intensive Applications by Martin Kleppmann is a classic. I really hope we get an updated version, but the fundamentals all still hold anyway.
replies(2): >>41857486 #>>41858824 #
rahoulb ◴[] No.41857486[source]
Upvote for that book.

I read it a few months ago and was really impressed with how easy it was to read.

It starts out with simple stuff, like serialising data as JSON vs XML. But it moves into complex areas - like how replication and WALs work, including different ways of handling consensus when using leader-leader replication and how Spanner needs atomic clocks to handle it.

But even the complex stuff was explained in a way that I understood, which is an immense achievement.

replies(1): >>41857842 #
1. globular-toast ◴[] No.41857842[source]
Yep, this is my number 1 "I wish I'd read this X years ago" book.

I'm someone who has been doing this stuff for almost two decades without really knowing this is what I'm doing. I used to think what I was going to do was systems level programming like operating systems and maybe the database systems themselves (e.g. postgres, datomic etc.). But for whatever reason my entire career (so far, but I don't see it changing) has been building data systems for businesses and users.

I read the book from cover to cover and half of it was like "ohh... that's how that works, that's what I'm doing wrong" and the other half was "shit, this is something I kinda knew after trying and failing for years, and someone has just written it down in a way I never could".