Most active commenters
  • ncruces(10)
  • jauntywundrkind(6)
  • adhamsalama(4)

←back to thread

Sqlite3 WebAssembly

(sqlite.org)
647 points whatever3 | 54 comments | | HN request time: 3.347s | source | bottom
1. simonw ◴[] No.41851934[source]
Something that would be really fun would be to run SQLite in-memory in a browser but use the same tricks as Litestream and Cloudflare Durable Objects (https://simonwillison.net/2024/Oct/13/zero-latency-sqlite-st...) to stream a copy of the WAL log to a server (maybe over a WebSocket, though intermittent fetch() POST would work too).

Then on subsequent visits use that server-side data to rehydrate the client-side database.

From https://sqlite.org/forum/info/50a4bfdb294333eec1ba4749661934... is looks like WAL mode is excluded from the default SQLite WASM build so you would have to go custom with that.

replies(10): >>41852040 #>>41852194 #>>41853089 #>>41854540 #>>41854586 #>>41854654 #>>41855596 #>>41856415 #>>41857000 #>>41858635 #
2. dustinchilson ◴[] No.41852040[source]
Are you thinking something like https://electric-sql.com/
replies(1): >>41853301 #
3. fcanesin ◴[] No.41852194[source]
Isn't a more advanced/production version of this what the combo of couchdb and pouchdb do since several years ago?
replies(1): >>41852240 #
4. simonw ◴[] No.41852240[source]
Yeah this kind of thing has certainly been tried before, I feel like SQLite WASM plus WAL might be an interesting twist on the idea.
5. ncruces ◴[] No.41853089[source]
There are many layers of that's not how it works at play here.

In-memory SQLite databases don't use WAL. Wasm (and browser Wasm, in particular) doesn't support anything like the shared memory APIs SQLite wants for its WAL mode.

Litestream requires a very precise WAL setup to work (which just so happens to work with the default native SQLite setup, but is hard to replicate with Wasm).

Cloudflare Durable Objects may have been inspired by Litestream but works very differently (as do LiteFS, Turso, etc…)

The general idea of streaming changes from SQLite would work, but it's a lot of work, and the concurrency model of in-browser Wasm will make it challenging to implement.

(I wrote that forum post some time ago, and have WAL working in a server side Wasm build of SQLite, but none of the options to make it work would make much sense, or be possible, in browser)

replies(5): >>41853607 #>>41853645 #>>41853823 #>>41857460 #>>41860639 #
6. PUSH_AX ◴[] No.41853301[source]
What’s the catch with this thing?
replies(1): >>41853915 #
7. digdugdirk ◴[] No.41853607[source]
As someone who uses sqlite fairly regularly, but doesn't understand what most of those paragraphs mean, do you have any recommendations for learning resources?

I'm gathering that I need to learn about: - WAL - Shared Memory APIs - Concurrency models - Durable Objects?

replies(2): >>41853788 #>>41855356 #
8. jauntywundrkind ◴[] No.41853645[source]
Then don't use in-memory sqlite? Use file backed sqlite but have your wasm implementation of those "system calls" just be to memory?

I dunno, feels like you're coming down too hard.

replies(1): >>41854002 #
9. wyager ◴[] No.41853788{3}[source]
WAL: Write ahead log, common strategy for DBs (sqlite, postgres, etc.) to improve commit performance. Instead of fsync()ing every change, you just fsync() a log file that contains all the changes and then you can fsync() the actual changes at your leisure

Shared memory API: If you want to share (mutable) data between multiple processes, you need some kind of procedure in place to manage that. How do you get a reference to the data to multiple processes, how do you make sure they don't trample each other's writes, etc.

Concurrency model: There are many different ways you can formalize concurrent processes and the way they interact (message passing, locking, memory ordering semantics, etc.). Different platforms will expose different concurrency primitives that may not work the same way as other platforms and may require different reasoning or code structure

Durable objects - I think this is some Cloudflare service where they host data that can be read or modified by your users

This is all from memory, but IME, GPT is pretty good for asking about concepts at this level of abstraction

replies(1): >>41854152 #
10. jchanimal ◴[] No.41853823[source]
If you like, solving these sort of problems, we are tackling them at Fireproof.

Our database API is modeled on CouchDB and MongoDB, but our storage abstractions are along the lines of what you need to build the multi writer WAL you describe.

More details here https://jsr.io/@fireproof/encrypted-blockstore

11. T-Winsnes ◴[] No.41853915{3}[source]
The security model is challenging, as it relies on Postgres users for iam. Your users essentially log directly into your db
replies(2): >>41854135 #>>41855021 #
12. ncruces ◴[] No.41854002{3}[source]
File backed SQLite in a browser? Do you mean like OPFS?

https://sqlite.org/wasm/doc/trunk/persistence.md#opfs-wal

Again, just because the all the Lego pieces sound like they should all just fit together, doesn't mean that they will.

The VFS mechanism was primarily designed to make SQLite easy to port to multiple OSes. WAL mode is hard to port everytime you step away from a more traditional OS.

“We have SQLite in the browser, let's just stream the WAL like the Litestream hack” does not add up.

It's not impossible, but it surely took a lot of effort at Cloudflare (or at Fly, or Turso) to get there. And neither of them opened it up sufficiently to help us do the same. In a browser.

replies(2): >>41854206 #>>41857305 #
13. dumbo-octopus ◴[] No.41854135{4}[source]
Isn’t Postgres a fairly capable IAM provider, all things considered? I’d their access control mechanisms at least as much as a run of the mill external backend’s.
replies(1): >>41854423 #
14. digdugdirk ◴[] No.41854152{4}[source]
Thank you!

And side note on your last point - I've been burned too many times by confident hallucinations to trust my foundational learning to GPT. I hope someday that will improve, but for now ChatGPT is as trustworthy as an evening chat with someone at the bar.

... Someone who has been drinking since happy hour.

replies(3): >>41855004 #>>41856177 #>>41857095 #
15. andrewf ◴[] No.41854206{4}[source]
I haven't looked but I bet a lot of the WAL complexity comes down to supporting consistency and durability guarantees, neither of which you necessarily need for your in-browser use case.
replies(1): >>41856431 #
16. T-Winsnes ◴[] No.41854423{5}[source]
For basic auth it works well, but the challenge comes when you need to integrate with oidc, need to enforce mfa, enable sso etc. session invalidation is also quite complicated.

You need an identity middle man in front of the Postgres identity to tackle these and validate that the session is still active. Last time I looked at electric it was a big challenge to integrate such a service. This might have improved since then however

17. Fabricio20 ◴[] No.41854540[source]
I think OrbitDB [1] is the closest we have to a functional version of this right now? Unsure, but the concept is really cool!

[1]: https://github.com/orbitdb/orbitdb/blob/main/docs/GETTING_ST...

18. billywhizz ◴[] No.41854586[source]
hi simon. i direct messaged you on twitter about a PoC i did of this in aug 2022, but never heard back - i thought you might have been interested. my twitter handle is justjs14.

i have some code i would have to dig out that did this very thing - it allows you to open a SQLite db in browser using sqlite (with a VFS) compiled to wasm (not the official WASM build), make changes and both push and pull WALs to and from a server (or indeed browser to browser would be possible both manually or over WebRTC). it even works with github pages if you give the browser client a github token to work with.

if you are interested, feel free to ping me and i can see if i can get this up and running from scratch again. i did a ton of experiments with this approach around then and i think it could be useful for a subset of applications at least.

there's also a working demo of the pull functionality only here: https://just.billywhizz.io/sqlite/demo/#https://just.billywh...

replies(1): >>41856980 #
19. conradev ◴[] No.41854654[source]
LiteVFS can be compiled in a browser and sync with LiteFS cloud: https://github.com/superfly/litevfs

It needs to be run from a worker, though

20. infogulch ◴[] No.41855004{5}[source]
I was curious and Grok 2 seemed to do pretty good: https://x.com/i/grok/share/c2qCdF2wwIx7AHz0U1f2u8dTO
21. infogulch ◴[] No.41855021{4}[source]
You can see what this means specifically from the docs: https://electric-sql.com/docs/guides/auth
22. throwaway2037 ◴[] No.41855356{3}[source]
WAL - Shared Memory APIs: https://sqlite.org/wal.html

concurrency model of in-browser Wasm: https://medium.com/@quincarter/wasm-is-awsm-and-multi-thread...

Cloudflare Durable Objects: https://www.cloudflare.com/developer-platform/durable-object...

23. hiccuphippo ◴[] No.41855596[source]
This sounds like what meteor was 10+ years ago but with sqlite instead of mongodb.
replies(1): >>41855664 #
24. nesarkvechnep ◴[] No.41855664[source]
Nothing like it, actually.
25. globular-toast ◴[] No.41856177{5}[source]
If you'd like a trustworthy overview, the book Designing Data-Intensive Applications by Martin Kleppmann is a classic. I really hope we get an updated version, but the fundamentals all still hold anyway.
replies(2): >>41857486 #>>41858824 #
26. matharmin ◴[] No.41856415[source]
Unlike other opinions here I do think it is technically feasible to stream a copy of the WAL - it just has to be implemented in the VFS. "Shared memory" could be a SharedArrayBuffer, or just a normal buffer if you only have one database connection open at a time (for example in a SharedWorker, which is already common). It may not be simple to implement, but definitely possible.

The biggest actual issue is that it will capture block-level changes, not row-level changes. This means it can work to replicate a complete database, but partial sync (e.g. sharing some rows with other users) won't be feasible.

To get row-level changes for partial sync, you need to use something like triggers or the SQLite session extension [1]. For PowerSync we just use triggers on the client side. I find that works really well, and I haven't found any real downsides to that except perhaps for the work of maintaining the triggers.

[1]: https://sqlite.org/sessionintro.html

27. ncruces ◴[] No.41856431{5}[source]
Not really, or it depends. That complexity is dealt with by SQLite.

The complexity for you comes from trying to reuse their (battle tested) implementation when (as I wrote above) the primitives they depend upon were not meant to make porting to a browser sandbox easy.

And the problems there are the specific concurrency model they depend upon: communicate by sharing memory.

Then, you're either working at the wrong abstraction level (and it shows), or you're patching and gutting SQLite.

SQLite is meant to work with files, file locks, shared memory, fsync and mmap.

It also doesn't work out great if you try to persist to an object store, to a KV store, or…

I'm repeating myself, but yeah. You can make it work. Others have made it work. But it's still a lot of work, and you're throwing away a lot of what makes SQLite… SQLite.

28. adhamsalama ◴[] No.41856980[source]
> (or indeed browser to browser would be possible both manually or over WebRTC)

I have just done something similar in the past week, but without the WAL.

It's pretty much an alternative to online spreadsheets for me.

http://github.com/adhamsalama/sqlite-wasm-webrtc

replies(2): >>41857382 #>>41858678 #
29. adhamsalama ◴[] No.41857000[source]
I used SQLite compiled to WebAssembly in the browser and replicated it (without the WAL) using WebRTC to create a way to collaborate on databases in the browser instead of using apps like Google Sheets. There's no server required (other than the WebRTC signaling server), so it's private and secure too.

http://github.com/adhamsalama/sqlite-wasm-webrtc

30. chx ◴[] No.41857095{5}[source]
Of course. The only thing LLMs are good for is...

https://hachyderm.io/@inthehands/112006855076082650

> You might be surprised to learn that I actually think LLMs have the potential to be not only fun but genuinely useful. “Show me some bullshit that would be typical in this context” can be a genuinely helpful question to have answered, in code and in natural language — for brainstorming, for seeing common conventions in an unfamiliar context, for having something crappy to react to.

> Alas, that does not remotely resemble how people are pitching this technology.

31. jauntywundrkind ◴[] No.41857305{4}[source]
Emscripten's default file system provider is memfs, in memory. Maybe there would be some challenges, some spec limitations using that, but I strongly expect it closer to a weekend or two of hacking to get some special weird mystic quirkiness that WAL relies on than some long ordeal that keeps going on endlessly (to get sqlite running with WAL). https://emscripten.org/docs/api_reference/Filesystem-API.htm...

OPFS is interesting tech but again a red herring misdirecting from what had been raised, using an in-memory filesystem like the default thing that emscripten (the default toolchain) does.

replies(1): >>41857628 #
32. billywhizz ◴[] No.41857382{3}[source]
cool. will check this out. i think it's an interesting approach and allows all sorts of very low rent interactivity as long as you don't need super high throughput or expect lots of contention.
replies(1): >>41860008 #
33. rockwotj ◴[] No.41857460[source]
I believe that streaming changes from SQLite is what https://sqlsync.dev/ is
replies(1): >>41857829 #
34. rahoulb ◴[] No.41857486{6}[source]
Upvote for that book.

I read it a few months ago and was really impressed with how easy it was to read.

It starts out with simple stuff, like serialising data as JSON vs XML. But it moves into complex areas - like how replication and WALs work, including different ways of handling consensus when using leader-leader replication and how Spanner needs atomic clocks to handle it.

But even the complex stuff was explained in a way that I understood, which is an immense achievement.

replies(1): >>41857842 #
35. ncruces ◴[] No.41857628{5}[source]
I… really don't get this.

The people on SQLite, employed to work on this full time for over a year, have this to say (on the link I posted above):

“Because the WASM build does not have shared memory APIs, activating WAL requires that a client specifically activate exclusive-locking mode for a db handle immediately after opening it, before doing anything else with it…

“WAL mode does not provide any concurrency benefits in this environment. On the contrary, the requirement for exclusive locking eliminates all concurrency support…”

I personally worked to implement shared memory WAL for a server side Wasm port of SQLite. But random internet poster decides to “strongly expect it closer to a weekend or two of hacking.”

Please, do me a favor and do spend that weekend or two for the benefit of the rest of us all. It'll sincerely be much appreciated.

PS: it was a random internet poster¹ (who's been posting in this thread) who helped me figure out how to implement shared memory WAL for my port. It still took way more than “a weekend or two.” So if you do figure out how to crack this, I'm sure that people who've been trying for the past year² will definitely appreciate it.

1: https://github.com/ncruces/go-sqlite3/discussions/69

2: https://github.com/rhashimoto/wa-sqlite

replies(1): >>41858190 #
36. ncruces ◴[] No.41857829{3}[source]
Yep, that doesn't use WAL mode at all, and it's a decent amount of work.

It uses a custom VFS, memory journal mode, and bypasses SQLite for optimistic concurrency handling. I dunno how it handles crash safety, or if it even uses the same on disk file format.

My point here is not that this is impossible, it's that SQlite's WAL implementation isn't meant to be pluggable, and Litestream is a very clever hack, that requires a very specific setup to work well.

It takes a fair amount of work to replicate that in other environments.

37. globular-toast ◴[] No.41857842{7}[source]
Yep, this is my number 1 "I wish I'd read this X years ago" book.

I'm someone who has been doing this stuff for almost two decades without really knowing this is what I'm doing. I used to think what I was going to do was systems level programming like operating systems and maybe the database systems themselves (e.g. postgres, datomic etc.). But for whatever reason my entire career (so far, but I don't see it changing) has been building data systems for businesses and users.

I read the book from cover to cover and half of it was like "ohh... that's how that works, that's what I'm doing wrong" and the other half was "shit, this is something I kinda knew after trying and failing for years, and someone has just written it down in a way I never could".

38. jauntywundrkind ◴[] No.41858190{6}[source]
> WAL mode does not provide any concurrency benefits in this environment.

Except we aren't interested in concurrency or performance benefits of WAL; we want it for something else entirely (replication).

replies(1): >>41858933 #
39. j12a ◴[] No.41858635[source]
I didn't fully understand this idea of hydration but I also got recently interested in leveraging Wasm, in context of running parts of backend logic using it (eg. regular templating and some Htmx endpoints) to allow certain offline features with regular server-side web frameworks that could use similar Sqlite Wasm datastore as OP.

- Full Django example https://github.com/m-butterfield/django_webassembly - Minimal Rust example with client-server separation using macros https://github.com/richardanaya/wasm-service - Clojure Electric using more macros for something similar (without wasm) https://github.com/hyperfiddle/electric - Recent M$ web tooling starting with letter B?

In last few years, surprisingly many parts of Python ecosystem to have gained Wasm support that can leverage this also. Obvious ML related ones and even game libraries like Pygame and Pyxels. Kivy support (multi-touch UI library) should not be too far out either now that SDL2 library has added support.

Seems like there's many new possibilities for running code in any computer, optionally sandboxed in web browser to avoid platform bureaucracy or taxes.

40. j12a ◴[] No.41858678{3}[source]
This is a very interesting demo.
replies(1): >>41860013 #
41. chucksmash ◴[] No.41858824{6}[source]
O'Reilly shows a 2nd edition slated for December 2025 but it seems like you can access it early with Safari Books.

https://www.oreilly.com/library/view/designing-data-intensiv...

42. ncruces ◴[] No.41858933{7}[source]
Then you don't need WAL. I didn't say replication isn't possible, I said "that's not how it works."

A bunch of different people solved this already (though none in a couple of weekends); common to all of them: they don't use WAL mode.

replies(1): >>41859123 #
43. jauntywundrkind ◴[] No.41859123{8}[source]
> The shadow WAL is a directory next to your SQLite database where WAL files are effectively recreated as a sequence. . . These WAL files contain the original WAL frames & checksums to ensure consistency.

https://litestream.io/how-it-works/

> (though none in a couple of weekends)

As I said, getting sqlite in wasm to run in WAL mode such that we could start to implement replication, like for example how litestream does. Not this strawman you build, of implementing replication.

replies(1): >>41860897 #
44. adhamsalama ◴[] No.41860008{4}[source]
Thanks!

I purposefully replicate data and queries only, so to change the shown data for everyone you have to run a query, so after the import process (which can be slow for large files), queries are replicated fast enough IMO because it's just a small string sent over WebRTC, so in theory it should be faster to run a query over a big dataset and see its result appear for all peers than a centralized app like Google Sheets, but I haven't benchmarked it yet.

45. adhamsalama ◴[] No.41860013{4}[source]
Thanks!
46. lfmunoz4 ◴[] No.41860639[source]
"The general idea of streaming changes from SQLite would work, but it's a lot of work"

Seems easy to me, just store/stream all update sql statements. That should have all information needed and don't have to mess with WAL data format etc.

replies(2): >>41861089 #>>41861462 #
47. ncruces ◴[] No.41860897{9}[source]
Seriously?

Litestream works out-of-process. It needs shared memory WAL and file locking to work exactly like on "desktop" SQLite for it to even function and produce this "shadow WAL." No ifs, no buts, no inbetweens.

You just don't build Litestream without shared memory WAL. That's my entire point. These pieces you seem to think can be made to simply work together can't, in fact, work together at all, much less simply.

Exclusive locking mode WAL doesn't work with Litestream, and can't be made to work with Litestream, even with a little elbow grease.

replies(1): >>41864943 #
48. ncruces ◴[] No.41861089{3}[source]
Yeah, I wonder why no one thought of that.
49. LukeLambert ◴[] No.41861462{3}[source]
The big problem with statement-based replication is that many queries are non-deterministic. e.g. Inserting a row with current_timestamp or random()
50. jauntywundrkind ◴[] No.41864943{10}[source]
Litestreams how-we-did-this to says they take a lock on the whole db during checkpoints to work, so I don't see why shared would be needed.

Seriously bro, take a massive chill pill. You have been way overcommitted to the "it's all impossible!!! How stupid to imagine! If only you were smart like me you'd see everything is impossible!" bit way too hard. Take a breather from your possessed absolute self certainty & derogatory belittling of others.

There's also lots of different fs impls for wasm! Just because some limitations apply to emscriptens impl doesn't mean another impl has the same limitations, or that we can't improve or work around. Use some of your hard working loves-to-tax-itself big brain to find avenues of possibility, rather than just pissing on every possibility.

replies(1): >>41867073 #
51. ncruces ◴[] No.41867073{11}[source]
I'm chill bro. Still waiting for your productive couple of weekends.

But that'll require more than looking at docs reading the words "lock" and "shared" and assuming you know what those mean.

replies(1): >>41871483 #
52. jauntywundrkind ◴[] No.41871483{12}[source]
You've been wrong about how some of these replications work, you've been wrong about why the WAL doesn't work, and you still insist on being a pompous rude brat here? You have not been chill.
replies(2): >>41871956 #>>41903568 #
53. ncruces ◴[] No.41871956{13}[source]
Litestream is out of process, requires shared memory WAL, doesn't work with exclusive mode WAL, wouldn't work work on Wasm.

LiteFS didn't support WAL mode initially, now does. The Wasm version doesn't support WAL mode at all.

mvSQLite doesn't support shared memory WAL. SQLSync doesn't support WAL at all.

Browser Wasm builds either don't have WAL enabled, or don't support shared memory WAL.

These replications are VFSes or FUSE drivers, and for those, it's much easier to work with rollback mode than WAL, especially if you don't want concurrency (or intend to provide concurrency through other means, like optimistic concurrency).

In open source, only Turso actually did a VFS to replicate the WAL. They had to break SQLite so much they call it libSQL. It also requires shared memory as is, would be hard (if possible) to port to Wasm.

Please, do take the challenge and build this. It will be much appreciated.

I've been asking the Turso guys to document their thing for a year, with little success. They have the interface (which just opens up undocumented internal SQLite interfaces), with no open source samples of how to use it (except a logger), and zero additional docs.

54. CRConrad ◴[] No.41903568{13}[source]
TBF, to this bystander ncruces isn't the one of the two of you who is coming off as more of "a pompous rude brat".