For an in-browser app, that seems a bit much but of course wasm runs in other places these days where it might make more sense.
For an in-browser app, that seems a bit much but of course wasm runs in other places these days where it might make more sense.
> These components were initially released for public beta with version 3.40 and will tentatively be made API-stable with the 3.41 release, pending community feedback.
But the most recent release of SQLite is 3.46.1 (from 2024-08-13)
Presumably they are now "API-stable" but the page hasn't been updated yet.
It would be great if the SQLite team published an official npm package bundling the WASM version, could be a neat distribution mechanism for them. (UPDATE: They do, see replies to this post.)
My favourite version of SQLite-in-WASM remains the Pyodide variant, which has been around since long before the official SQLite implementation. If you use Pyodide you get a WASM SQLite for free as part of the Python standard library - I use that for https://lite.datasette.io/ and you can also try it out on https://pyodide.org/en/stable/console.html
import sqlite3
print(sqlite3.connect(':memory:').execute(
'select sqlite_version()'
).fetchall())
That returns 3.39.0 from 2022-06-25 so Pyodide could do with a version bump. Looks like it inherits that version from emscripten: https://github.com/emscripten-core/emscripten/blob/main/tool...Meanwhile for “local-only” it would be great to use sqlite in the browser + native file system API so that the db could be stored on the user’s file system and we wouldn’t have to worry about browser storage eviction. i think that could really open up a whole world of privacy preserving offline software delivered through the browser
There's a number of WASM platforms/tools: Wasmer, wasmCloud, a few others that escape my memory.
sqlite-wasm loads much faster than Pyodide, so if you don't need Python, then the former is a better choice.
I think they've been doing that for a while, in JS script you can already do this:
import sqlite3InitModule from "https://cdn.jsdelivr.net/npm/@sqlite.org/sqlite-wasm/sqlite-wasm/jswasm/sqlite3-bundler-friendly.mjs";
const sqlite3 = await sqlite3InitModule({
locateFile(file: string) {
return "https://cdn.jsdelivr.net/npm/@sqlite.org/sqlite-wasm/sqlite-wasm/jswasm/sqlite3.wasm";
},
});
// SQLite's C API
const capi = sqlite3.capi;
console.log("sqlite3 version", capi.sqlite3_libversion(), capi.sqlite3_sourceid());
// OO API example below oo1 docs https://sqlite.org/wasm/doc/tip/api-oo1.md
const oo = sqlite3.oo1;
const db = new oo.DB();
const createPersonTableSql = `
CREATE TABLE IF NOT EXISTS person (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
age INTEGER NOT NULL
);
`;
db.exec([createPersonTableSql]);
It works in regular old script tag with type=module, or Deno. I have example HTML here:https://github.com/Ciantic/experimenting-sqlite-wasm/blob/ma...
For the read path it hooks into Postgres logical replication or MongoDB change streams (and MySQL binlog soon). It supports partial syncing using declarative rules. For the write path, it allows writing to the local SQLite database and also places writes into an upload queue, and then uses a developer-defined function to upload writes to the backend API.
We did a deep dive on current options for SQLite on the web, and are currently using an IndexedDB-based VFS, and looking to move to OPFS: https://www.powersync.com/blog/sqlite-persistence-on-the-web
We recently released an integration with TanStack Query to allow leveraging some of its features in conjunction with PowerSync: https://docs.powersync.com/client-sdk-references/js-web/java...
> Meanwhile for “local-only” it would be great to use sqlite in the browser + native file system API so that the db could be stored on the user’s file system and we wouldn’t have to worry about browser storage eviction. i think that could really open up a whole world of privacy preserving offline software delivered through the browser
Agreed. This is a limitation of IndexedDB and OPFS as persistent browser storage currently
Also worth considering parsing of wasm is significantly faster than JS (unfortunately couldn't find the source for this claim, there is at lease one great article on the topic)
https://developer.mozilla.org/en-US/docs/Web/API/IndexedDB_A...
Okay that's listed here: https://sqlite.org/wasm/doc/trunk/persistence.md
EDIT: Self answered.
[1] https://tetr.app
If you’re just storing user preferences, obviously don’t download SQLite for your web app just to do that… but if you’re doing something that benefits from a full database, don’t fret so much about 1MB that you go try to reinvent the wheel for no reason.
If the other comment is correct, then it won’t even be 1.3MB on the network anyways.
I played around with warm sqlite too. That was really nice but I decided against it due to the fact that it was totally unsupported.
With the help of Claude, though it incorrectly hallucinated some of the details despite me pasting in documentation: https://gist.github.com/simonw/677c3794051c4dfeac94e514a8e5b...
Also, WASI is very far from answer (so far). The SQLite amalgamation builds fine for WASI but concurrency is an unsolved issue.
I had to build a VFS from scratch to get my Wasm based SQLite driver into a usable shape.
https://github.com/ncruces/go-sqlite3/blob/main/vfs/README.m...
You may benefit from perusing the FAQ on that page.
Then on subsequent visits use that server-side data to rehydrate the client-side database.
From https://sqlite.org/forum/info/50a4bfdb294333eec1ba4749661934... is looks like WAL mode is excluded from the default SQLite WASM build so you would have to go custom with that.
just my intuition when I read the headline of this post - something like the interplay between PouchDB and CouchDB for offline first apps
https://medium.com/offline-camp/couchdb-pouchdb-and-hoodie-a...
(Previously discussed 7 months ago: https://news.ycombinator.com/item?id=39477457)
wow I didn't know this was a thing. thanks for filling me in!
It's more of an experiment than an attempt to make something production ready, though I could see it being useful to bring dependency-less sqlite tooling to the JVM ecosystem.
In a regular compiler/linker scenario it would just be a static compilation. Here we have a JS app and WASM library.
More specifically, would this be able to be a "replacement" for indexedDB? Does the data persist, or do I need to keep the sqlite file in the filesytemAPI (or indexedDB/localstorage) myself?
Though it seems to be somewhat limited. I couldn't even check what version it has, since sqlite_version() was missing. Version in the repository [1] is 3.38.1, which is from quite a ways ago.
At the moment DuckDB web shell can't load SQLite extension, since that hasn't been released for yesterday's 1.1.2. Earlier version does work using recently updated WASM edition. That can be extended with spatial including GDAL, vector search etc [2]. Making your own "SQL web shell" wasn't too hard, though docs weren't quite complete enough for me.
[1] https://github.com/duckdb/sqlite_scanner/blob/main/src/sqlit... [2] https://github.com/duckdb/duckdb-wasm/releases/tag/v1.29.0
disclaimer: I work there
Is there any hope of that happening? Instead of abstracting and over specifying sqlite, can the spec just specify a version of the SQLite API browsers should support and roll the version periodically?
> Specific Goals of this Project
> Insofar as possible, support persistent client-side storage using available JS APIs. As of this writing, that includes the Origin-Private FileSystem (OPFS) and (very limited) storage via the window.localStorage and window.sessionStorage backend.
From the latest spec [1]:
> The specification reached an impasse: all interested implementors have used the same SQL backend (Sqlite), but we need multiple independent implementations to proceed along a standardisation path.
[1]: https://www.w3.org/TR/webdatabase/
This won't be a problem for wasm SQLite because it isn't a standard being shipped by browsers, just another dependency.
I can even do Sqlite's full text search without downloading the entire FTS database. Just most of it, if the search term is short enough.
Adding 400 for such a high quality piece of DB actually borders reasonability.
And makes me think: what the hell are frontend devs thinking!? Multiple MB's in JS for a news website. Hundreds of KB's for HTML. It's totally unreasonable.
Even if the spec just listed occasional version and the webpage could choose which one; that means a potentially tricky maintenance burden on browser to support old versions of a potentially no longer supported sqlite, and each version is another megabyte.
Why not then just choose this solution, and let each website pick its own poison?
If the concern is the repeated downloads of common resources, well, we've accepted that for other CDN's too, and a solution for shared caching of common dependencies would in any case be more valuable than merely for sqlite.
The current approach seems better than a browser-provided version.
What I'm asking is if I need to manage the sqlite file, as I would on an OS's file system, or if accessing the sqlite library will automatically persist that data to those web-native storages, like the way indexedDB doesn't require me to load an "idb" file and then "save" or "commit" that save. I just access it and write.
To be clear: I'm not asking academically. I wrote a whole library for managing data in indexedDB for local-first apps, and while it works well enough for what I need, it's iDB so it's subject to data deletion (not common, but allowed in the spec if necessary), and it's a pain to work with just because of its nature and API. So I've been waiting to move to sqlite for a while with the only holdbacks being "is it too heavy?", and "how much has to change?". With WASM, I think we're about as lightweight as its going to get. So I'm just curious if this aims to be a drop-in replacement, or if it still expects you to use it like sqlite on a native platform.
Now we get the most recent version of SQLite when we need it as a 410KB compressd WASM blob, as opposed to being stuck on browser-mandated versions of SQLite that might even be a decade old at this point.
https://github.com/rhashimoto/wa-sqlite/discussions/154
This essentially requires that we import the sqlite emscripten build via an extern C header in wasm bindgen, and then we need to re-implement the VFS in rust while compiling it in multi-threaded mode to allow for shared array buffer access. After that is all done, we will be able to access SQLite rows as raw wasm bytes. That gives us the ability to implement a rust-sqlite style wrapper or integration. There would still not be some of the niceties such as connection pooling, but in wasm you likely want to use the db in exclusive mode.
That's correct. i'll try my best to remember to update that reference the next time i'm back on the computer.
> It would be great if the SQLite team published an official npm package
Not a chance. We publish only vanilla JS and adamantly refuse to go down the rabit hole of supporting out-of-language tools (none of which any of our project members use). We support an "officially sanctioned" npm build, maintained by Thomas Steiner, but do not actively develop for any JS frameworks.
Direct support for any given framework (npm included) would give the impression that we endorse that framework, and endorsement of third-party projects is something we actively avoid.
In-memory SQLite databases don't use WAL. Wasm (and browser Wasm, in particular) doesn't support anything like the shared memory APIs SQLite wants for its WAL mode.
Litestream requires a very precise WAL setup to work (which just so happens to work with the default native SQLite setup, but is hard to replicate with Wasm).
Cloudflare Durable Objects may have been inspired by Litestream but works very differently (as do LiteFS, Turso, etc…)
The general idea of streaming changes from SQLite would work, but it's a lot of work, and the concurrency model of in-browser Wasm will make it challenging to implement.
(I wrote that forum post some time ago, and have WAL working in a server side Wasm build of SQLite, but none of the options to make it work would make much sense, or be possible, in browser)
A bit confused at this, NPM is just a package manager / distribution mechanism, not a framework. Totally fair if you don't want to publish for all the package managers, though for Javascript there's only a few that are relevant. NPM has been around for a decade.
I'm gearing up for a 0.2 release this week which should iron out a few kinks, but otherwise you can use the diesel ORM as if its native
It's an out-of-language packaging/distribution framework (and it's not the only one). It's not part of the JS standards.
My comments above have been edited to reframe our stance on npm and frameworks in general.
I know what you mean here, but I think we're very limited in what we tend to run. Polyglot programming still isn't really a thing, and with things like WASI standardized (someday soon I hope), I could imagine it becoming a lot nicer.
When I did this experiment a few months ago, what we could accomplish was pretty limited. I could load and query databases, but not write to them. However the Chicory wasip1 implementation is advancing.
BTW, we've borrowed a few ideas from wazero so thanks for your work there :)
Context: Currently using go-duckdb and while it's working for us, getting rid of cgo would be a huge help. Would be quite interested myself to attempt this.
If the goal was pure Java SQLite¹, a VFS from scratch would be better.
I think since I started my Go/wazero effort, WASI+SQLite improved a bunch. I had to start with the demo VFS; the Unix VFS now builds. But custom VFS is still the way to go, IMO.
And thanks! My contributions to wazero were tiny. Best of luck with Chicory!
1: strong NestedVM vibes here; 11 years ago… gosh, I feel old now. https://stackoverflow.com/questions/18186507/pure-java-vs-na...
agreed, though this was more an experiment to test Chicory once we built initial wasi support. I'd love to see it picked up and improved. I think that's the direction I'd go if i want some kind of production ready library.
Wasm is fine for compute (though concurrency is still a somewhat open question).
To have Wasm talk to the outside world, you need “host calls” where the guest calls the host.
On a browser that's Wasm calling JavaScript. On my Go driver, it's Wasm calling Go.
For server side, there's also a standard set of “host calls” modeled around POSIX/Linux syscalls called WASI.
I could've build my project around WASI, but WASI is rather limited (and SQLite support for WASI was more limited even, it's improved a bit since). DuckDB might work out-of-the-box this way.
I, instead, took advantage of SQLite's architecture and replaced its VFS layer with one in Go: https://sqlite.org/vfs.html
So SQLite in Wasm is just doing compute, and I do all the OS level stuff in Go. No need for Wasm concurrency, cause I can load multiple instances of my Wasm which act like independent OS processes that communicate through the filesystem (SQLite excels at this).
As I said, I dunno how well all those decisions would map to DuckDB.
The standardization issues around SQL already exist, are already widely known, and where common workarounds are already in practice. It's also an open source project that could have _easily_ incorporated compatibility code for this specific use case anyways.
They made blind fealty to process more important than the outcome to end users.
What a waste.
I'm gathering that I need to learn about: - WAL - Shared Memory APIs - Concurrency models - Durable Objects?
I dunno, feels like you're coming down too hard.
Shared memory API: If you want to share (mutable) data between multiple processes, you need some kind of procedure in place to manage that. How do you get a reference to the data to multiple processes, how do you make sure they don't trample each other's writes, etc.
Concurrency model: There are many different ways you can formalize concurrent processes and the way they interact (message passing, locking, memory ordering semantics, etc.). Different platforms will expose different concurrency primitives that may not work the same way as other platforms and may require different reasoning or code structure
Durable objects - I think this is some Cloudflare service where they host data that can be read or modified by your users
This is all from memory, but IME, GPT is pretty good for asking about concepts at this level of abstraction
Our database API is modeled on CouchDB and MongoDB, but our storage abstractions are along the lines of what you need to build the multi writer WAL you describe.
More details here https://jsr.io/@fireproof/encrypted-blockstore
Interesting. So when I am running concurrent readers using your package, it is just loading multiple instances of the wasm code? (I bottleneck to a single writer in the application)
If you changed the word "framework" to "toolchain" in your post I think it would make a lot more sense to people.
They're thinking, "adding [some fraction of existing total payload] for such a high quality [feature] actually borders reasonability". Wash. Rinse. Repeat.
https://sqlite.org/wasm/doc/trunk/persistence.md#opfs-wal
Again, just because the all the Lego pieces sound like they should all just fit together, doesn't mean that they will.
The VFS mechanism was primarily designed to make SQLite easy to port to multiple OSes. WAL mode is hard to port everytime you step away from a more traditional OS.
“We have SQLite in the browser, let's just stream the WAL like the Litestream hack” does not add up.
It's not impossible, but it surely took a lot of effort at Cloudflare (or at Fly, or Turso) to get there. And neither of them opened it up sufficiently to help us do the same. In a browser.
And side note on your last point - I've been burned too many times by confident hallucinations to trust my foundational learning to GPT. I hope someday that will improve, but for now ChatGPT is as trustworthy as an evening chat with someone at the bar.
... Someone who has been drinking since happy hour.
I’m happy that we are actually be able to use SQL in the browser now (although I would rather skip the MBs of the bundle bloat). But I feel like the standards committee will now have even less of a reason to fix the very broken state of IndexedDB.
Another issue I have with a lot of the new local first products is that they tend to lock you into a particular database type on the backend, so this is refreshing.
reliable so far, being dogfooded in production as we speak
You need an identity middle man in front of the Postgres identity to tackle these and validate that the session is still active. Last time I looked at electric it was a big challenge to integrate such a service. This might have improved since then however
[1]: https://github.com/orbitdb/orbitdb/blob/main/docs/GETTING_ST...
i have some code i would have to dig out that did this very thing - it allows you to open a SQLite db in browser using sqlite (with a VFS) compiled to wasm (not the official WASM build), make changes and both push and pull WALs to and from a server (or indeed browser to browser would be possible both manually or over WebRTC). it even works with github pages if you give the browser client a github token to work with.
if you are interested, feel free to ping me and i can see if i can get this up and running from scratch again. i did a ton of experiments with this approach around then and i think it could be useful for a subset of applications at least.
there's also a working demo of the pull functionality only here: https://just.billywhizz.io/sqlite/demo/#https://just.billywh...
There is no way to use Sqlite3 off-thread without memory sharing? Couldn't postMessage work to pass data to the sqlite thread by using the third Transfer argument?
Would postMessage transfer allow memory to be stored in a sqlite wasm database running a worker off-thread?
Refering to this implementation's docs: https://github.com/sqlite/sqlite-wasm
It needs to be run from a worker, though
Fair point but the edit window has passed ;). For the sake of clarity for those still following along: "framework," in the context of my above comments, includes any non-formally-standardized tools or APIs which are built atop the standardized core.
There is good news there, some people plan to get Emscripten and Rust to work well together in Wasm:
https://github.com/rustwasm/wasm-bindgen/pull/4014#issuecomm...
Docs for it: https://www.triplit.dev/docs/frameworks/tanstack-router#exam...
It by default uses IndexedDB but can also use SQLite but does real time, relational querying and (optionally) syncs with you server.
Re: developer ergonomics, this is our primary focus so I don't love to get your feedback on!
> I think they've been doing that for a while,
Kinda: <https://sqlite.org/wasm/doc/trunk/npm.md>
We in the sqlite project neither use nor require npm in any capacity whatsoever, so it would be kinda silly for us to attempt to support it. We instead leave that level of code/tools to folks who use and/or care about them.
There _is_ an "officially sanctioned/blessed" npm repo, and we actively support its maintainer (e.g. we participate the issue tracker and make patches in the core distribution where they're strictly needed), but we otherwise keep a "hands off" policy when it comes to non-standardized APIs and toolchains.
We _like_ to see people to plug the sources into their tools of choice, but we cannot feasibly take on the burden of doing that plugging-in for them, especially given how fluid the JavaScript ecosystem is when it comes to frameworks and tools.
Sidebar: we rely heavily on Emscripten because there is, for all practical purposes, it has no substitute, but we also actively go out of our way to ensure that the sources can be easily plugged in to an alternative should one ever appear.
We have no CORS restrictions but one specific (and optional) VFS requires COOP/COEP for SharedArrayBuffer. If SharedArrayBuffer isn't available, that VFS won't load, but the rest of the library will plod along just fine:
We (the sqlite project, where the "vague" description comes from) do not define the use cases. Similarly, in the docs for the C library you won't find any more than passing references to specific use cases, and those are typically contrived for the sake of example. (One notable exception: <https://sqlite.org/appfileformat.html>)
> What I'm asking is if I need to manage the sqlite file, as I would on an OS's file system, or if accessing the sqlite library will automatically persist that data to those web-native storages, like the way indexedDB doesn't require me to load an "idb" file and then "save" or "commit" that save. I just access it and write.
That's all covered in the docs (of which there are well more than 100 lovingly-hand-written pages), but the short answer is "it just works." You have the _option_ of importing and exporting databases from the browser-native storage, but you don't have to.
For starters, see: <https://sqlite.org/wasm/doc/tip/persistence.md>
The pending 3.47 release has some build-side tweaks which enable a user to strip it down to "just the basics," but we've not yet been able to get it smaller than about 25-30% less than it otherwise is:
cd ext/wasm
make barebones=1 ; # requires GNU Make and the Emscripten SDK
Doing that requires building it yourself - there are no plans to publish deliverables built that way.The build process also supports including one's own C code, which could hypothetically be used to embed an application and the wasm part of the library (as distinct from the JS part) into a single wasm file. Its primary intended usage is to add SQLite extensions which are not part of the standard amalgamation build.
> Or is there a way to prune it having the used API surface?
Not with the provided JS pieces. Those have to expose essentially the whole C library, so they will not be pruned from the wasm file.
However, you could provide your own JS bindings which only use a small subset of the API, and Emscripten is supposedly pretty good about stripping out C-side code which neither explicitly exported nor referenced anywhere. You'd be on your own - that's not something we'll integrate into the canonical build process - but we could provide high-level support, via the project's forum, for folks taking that route.
The core issue was not the SQL, but the underlying storage engine.
Microsoft's office doc formats aside, standards bodies, as a general rule, require that a standard be built around multiple independent, conformant implementations. WebSQL did not measure up there because all vendors implementing it chose the only viable option they had for the underlying storage: SQLite.
Which if you standardized this generic interface you could have multiple independent database engines available not just ones that are SQL based, although, SQL in particular was readily available through SQLite, and would clearly be exceptionally popular.
Huge miss when organizations stick the the routine rather then take an opportunity to explore and examine new ways forward. Meanwhile everyone is being held hostage by V2 to V3 manifest changes by _one_ vendor.
concurrency model of in-browser Wasm: https://medium.com/@quincarter/wasm-is-awsm-and-multi-thread...
Cloudflare Durable Objects: https://www.cloudflare.com/developer-platform/durable-object...
Dexie gives an ORM-like experience but their syncing solution is not FOSS.
I genuinely don't mean to sound rude, and maybe I misunderstand, but how do you build software if you're not doing it with use cases in mind?
The biggest actual issue is that it will capture block-level changes, not row-level changes. This means it can work to replicate a complete database, but partial sync (e.g. sharing some rows with other users) won't be feasible.
To get row-level changes for partial sync, you need to use something like triggers or the SQLite session extension [1]. For PowerSync we just use triggers on the client side. I find that works really well, and I haven't found any real downsides to that except perhaps for the work of maintaining the triggers.
The complexity for you comes from trying to reuse their (battle tested) implementation when (as I wrote above) the primitives they depend upon were not meant to make porting to a browser sandbox easy.
And the problems there are the specific concurrency model they depend upon: communicate by sharing memory.
Then, you're either working at the wrong abstraction level (and it shows), or you're patching and gutting SQLite.
SQLite is meant to work with files, file locks, shared memory, fsync and mmap.
It also doesn't work out great if you try to persist to an object store, to a KV store, or…
I'm repeating myself, but yeah. You can make it work. Others have made it work. But it's still a lot of work, and you're throwing away a lot of what makes SQLite… SQLite.
The current approach is not portable to Windows, but it works fine on Linux, macOS, BSD and illumos. In general, portability is hindered more by file locking (I hate POSIX locks) than mmap.
The currently open GitHub issue is more bad default configuration than anything else. Configuring connections to use less memory by default should fix it.
I already have a PR ready for the next release that also opens this up for 32-bit platforms.
Also, GoToSocial (a self-hostable Mastodon alternative) moved to it (from modernc) for its first beta release.
Context makes all the difference here. If you're considering a big chunk of size for a relational database engine, you need to ask: are you making a complex application, or a normal web page? If it's the latter, then it's not reasonable at all.
And anything that makes the HTML itself that big is almost certainly bloat, not "high quality", and shouldn't be used in any context.
Each connection lives in its own isolated sandbox, and only communicates with other connections through the “file system” (which is a virtual abstraction, actually).
WAL mode is the “exception”: a few pages of the sandbox's memory are mapped to a file, and shared by all connections to the same database.
Each sandbox is single threaded, and mostly lock free, does all its business in the calling goroutine, and regularly checks back with the Go runtime to play nice with the Go scheduler.
It's a bit like an OS running multiple processes, with the VFS layer handling all syscalls.
I have just done something similar in the past week, but without the WAL.
It's pretty much an alternative to online spreadsheets for me.
https://hachyderm.io/@inthehands/112006855076082650
> You might be surprised to learn that I actually think LLMs have the potential to be not only fun but genuinely useful. “Show me some bullshit that would be typical in this context” can be a genuinely helpful question to have answered, in code and in natural language — for brainstorming, for seeing common conventions in an unfamiliar context, for having something crappy to react to.
> Alas, that does not remotely resemble how people are pitching this technology.
OPFS is interesting tech but again a red herring misdirecting from what had been raised, using an in-memory filesystem like the default thing that emscripten (the default toolchain) does.
I read it a few months ago and was really impressed with how easy it was to read.
It starts out with simple stuff, like serialising data as JSON vs XML. But it moves into complex areas - like how replication and WALs work, including different ways of handling consensus when using leader-leader replication and how Spanner needs atomic clocks to handle it.
But even the complex stuff was explained in a way that I understood, which is an immense achievement.
The people on SQLite, employed to work on this full time for over a year, have this to say (on the link I posted above):
“Because the WASM build does not have shared memory APIs, activating WAL requires that a client specifically activate exclusive-locking mode for a db handle immediately after opening it, before doing anything else with it…
“WAL mode does not provide any concurrency benefits in this environment. On the contrary, the requirement for exclusive locking eliminates all concurrency support…”
I personally worked to implement shared memory WAL for a server side Wasm port of SQLite. But random internet poster decides to “strongly expect it closer to a weekend or two of hacking.”
Please, do me a favor and do spend that weekend or two for the benefit of the rest of us all. It'll sincerely be much appreciated.
PS: it was a random internet poster¹ (who's been posting in this thread) who helped me figure out how to implement shared memory WAL for my port. It still took way more than “a weekend or two.” So if you do figure out how to crack this, I'm sure that people who've been trying for the past year² will definitely appreciate it.
It uses a custom VFS, memory journal mode, and bypasses SQLite for optimistic concurrency handling. I dunno how it handles crash safety, or if it even uses the same on disk file format.
My point here is not that this is impossible, it's that SQlite's WAL implementation isn't meant to be pluggable, and Litestream is a very clever hack, that requires a very specific setup to work well.
It takes a fair amount of work to replicate that in other environments.
I'm someone who has been doing this stuff for almost two decades without really knowing this is what I'm doing. I used to think what I was going to do was systems level programming like operating systems and maybe the database systems themselves (e.g. postgres, datomic etc.). But for whatever reason my entire career (so far, but I don't see it changing) has been building data systems for businesses and users.
I read the book from cover to cover and half of it was like "ohh... that's how that works, that's what I'm doing wrong" and the other half was "shit, this is something I kinda knew after trying and failing for years, and someone has just written it down in a way I never could".
That's a fair question in the general case, but that aspect doesn't much apply to SQLite's continued evolution. Perhaps it's an uncommon case in that regard.
SQLite initially grew out of a single, highly-specific use case which Richard wanted to solve. The solution, however, was highly generic, suitable for solving many, many specific problems. In the mean time, the set of use cases has evolved to, essentially, "just about anything for which you need to save data locally":
<https://sqlite.org/whentouse.html>
There are literally millions of concrete uses of SQLite in the wild, the majority of which were never conceived when the library first took shape but (and people are still coming up with exotic uses for it).
Individual features are sometimes added to help support concrete use cases, but the library is general-purpose enough that concrete use cases don't play a considerable role in its day-to-day development.
Except we aren't interested in concurrency or performance benefits of WAL; we want it for something else entirely (replication).
High quality software is something I rarely see nowadays when browsing "modern" websites.
- Full Django example https://github.com/m-butterfield/django_webassembly - Minimal Rust example with client-server separation using macros https://github.com/richardanaya/wasm-service - Clojure Electric using more macros for something similar (without wasm) https://github.com/hyperfiddle/electric - Recent M$ web tooling starting with letter B?
In last few years, surprisingly many parts of Python ecosystem to have gained Wasm support that can leverage this also. Obvious ML related ones and even game libraries like Pygame and Pyxels. Kivy support (multi-touch UI library) should not be too far out either now that SDL2 library has added support.
Seems like there's many new possibilities for running code in any computer, optionally sandboxed in web browser to avoid platform bureaucracy or taxes.
https://www.oreilly.com/library/view/designing-data-intensiv...
A bunch of different people solved this already (though none in a couple of weekends); common to all of them: they don't use WAL mode.
https://litestream.io/how-it-works/
> (though none in a couple of weekends)
As I said, getting sqlite in wasm to run in WAL mode such that we could start to implement replication, like for example how litestream does. Not this strawman you build, of implementing replication.
I purposefully replicate data and queries only, so to change the shown data for everyone you have to run a query, so after the import process (which can be slow for large files), queries are replicated fast enough IMO because it's just a small string sent over WebRTC, so in theory it should be faster to run a query over a big dataset and see its result appear for all peers than a centralized app like Google Sheets, but I haven't benchmarked it yet.
https://news.ycombinator.com/item?id=15670808
I thought the reasons given for not moving forward with standardizing WebSQL and using a SQLite implementation were (and undoubtedly still are) very, very stupid so I'm not the right person to represent them here.
As I pointed out in a separate comment:
Richard Hipp committed to creating and maintaining a flag in SQLite that would force SQLite to use whatever subset of SQL the WebSQL people settled on for their API. That would have of course worked independently of the SQLite version. (He also offered to write the SQL part of the spec.)
https://news.ycombinator.com/item?id=41860067
It was such a lost opportunity. I'm sure the SQLite guys would still be happy to do the work, but there certainly isn't any momentum on the browser side anymore.
Seems easy to me, just store/stream all update sql statements. That should have all information needed and don't have to mess with WAL data format etc.
Litestream works out-of-process. It needs shared memory WAL and file locking to work exactly like on "desktop" SQLite for it to even function and produce this "shadow WAL." No ifs, no buts, no inbetweens.
You just don't build Litestream without shared memory WAL. That's my entire point. These pieces you seem to think can be made to simply work together can't, in fact, work together at all, much less simply.
Exclusive locking mode WAL doesn't work with Litestream, and can't be made to work with Litestream, even with a little elbow grease.
What are your thoughts about search over the store? Hydrate a separate lunr index?
And finally has anybody tried using your syncing over webrtc?
Seriously bro, take a massive chill pill. You have been way overcommitted to the "it's all impossible!!! How stupid to imagine! If only you were smart like me you'd see everything is impossible!" bit way too hard. Take a breather from your possessed absolute self certainty & derogatory belittling of others.
There's also lots of different fs impls for wasm! Just because some limitations apply to emscriptens impl doesn't mean another impl has the same limitations, or that we can't improve or work around. Use some of your hard working loves-to-tax-itself big brain to find avenues of possibility, rather than just pissing on every possibility.
We have two OPFS-based VFSes. One requires COOP/COEP and one does not. Each makes feature trade-offs, though, so they're not equivalent.
We also offer persistence via localStorage and sessionStorage, so OPFS is not (for small databases) required for persistence.
I am struggling to think of a single standardized language whose standard defines the package manager
LiteFS didn't support WAL mode initially, now does. The Wasm version doesn't support WAL mode at all.
mvSQLite doesn't support shared memory WAL. SQLSync doesn't support WAL at all.
Browser Wasm builds either don't have WAL enabled, or don't support shared memory WAL.
These replications are VFSes or FUSE drivers, and for those, it's much easier to work with rollback mode than WAL, especially if you don't want concurrency (or intend to provide concurrency through other means, like optimistic concurrency).
In open source, only Turso actually did a VFS to replicate the WAL. They had to break SQLite so much they call it libSQL. It also requires shared memory as is, would be hard (if possible) to port to Wasm.
Please, do take the challenge and build this. It will be much appreciated.
I've been asking the Turso guys to document their thing for a year, with little success. They have the interface (which just opens up undocumented internal SQLite interfaces), with no open source samples of how to use it (except a logger), and zero additional docs.
Package managers are an anti-pattern. When the standard package manager is present, the packaging problem is regarded as solved ("just use the package manager"), and so the actual mechanisms underlying program construction are then out of the negative feedback loop and will rot.