It wasn’t json, but it probably could’ve been.
Or is there no practical difference?
We can return just the latest updated documents by maintaining a lastUpdated timestamp on client and server docs. But this would be at a document level. So, only the fields that have changed would be sent to the client.
https://github.com/pksunkara/nosqlite
I could add a section
A reason you don't see much in the "tree of simple files as my DB" space is that the whole point of it is to be trivially simple to understand, and thus, to implement.
Now, with file DBs, it's actually quite important to use atomic writes, so that a crash or concurrent operations don't produce errors. For example, the package posted above can corrupt data if there's concurrent writers to the same data file, because all the writes are non-atomic fs.writeFileSync(). What this database should do instead is write to a tempfile, and then rename the tempfile to replace the destination. That way, you get simple last-write-wins semantics with no possibility of creating invalid JSONs.
And in fact, there is a sqlitediff.exe tool for diffing sqlite databases: https://sqlite.org/sqldiff.html
But, it would still be a bummer to store them in git because git won't efficiently track changes between sqlite databases, it'll just add the whole binary blob on every commit. oof.
I have vague recollections of a tool that was similarly supposed to make JSON files diffable. IIRC it procedurally built up the object, so {"foo": {"bar": "baz", "quux": [1, 2, 3]}} would be:
foo = {}
foo.bar = "baz"
foo.baz = []
foo.baz[0] = 1
...
sqlite3 yourdb.sqlite .dump
If you really want to track the binary e.g. inside git, but still see textual diffs you would put that into your .gitconfig: [diff "sqlite3"]
textconv = sqlite3 $1 .dump
I have a demo of it running against my blog's database here: https://github.com/simonw/simonwillisonblog-backup
For my use case, speed was not the most important, instead it was using common tools on the DB, being human-readable, in JSON and git diff-able. Also, I wanted small resource usage and small library size.
$ augtool -r . --noautoload --transform "Json.lns incl /sirdb-example.json" <<EOF
> print /files/sirdb-example.json/
> EOF
/files/sirdb-example.json
/files/sirdb-example.json/dict
/files/sirdb-example.json/dict/entry[1] = "_owner"
/files/sirdb-example.json/dict/entry[1]/string = "nouser"
/files/sirdb-example.json/dict/entry[2] = "username"
/files/sirdb-example.json/dict/entry[2]/string = "nouser"
/files/sirdb-example.json/dict/entry[3] = "email"
.... etc ...
[edit] just did some research and it looks like Git will store the delta of a binary file in its packfiles, just like a text file. The question is just how delta-able sqlite binary files are.
Our internal database is deeply hierarchical and also has a concept of a "folder" record, which users can create and nest. Users have complained for years that there was no change tracking. I surveyed some of the tools mentioned here (and other commercial ones) but they all just flattened the database into json or XML or just raw sql files, which made it very difficult for users to see the changes.
We also thought about getting rid of the database and running off of json files, but that would have required completely rewriting our already brittle internal ORM, which uses raw SQL in too many places for performance, and would have been a breaking change. I have a heavy dislike for rewriting or throwing away whole modules, and also dislike breaking changes, having been personally bitten by them many times in the past. And due to other constraints on my team this was unacceptable. It's still on the table, just not for now.
Instead I wrote a layer which translates back and forth between the ORM (or the raw database) and the json files in the file system. This is an addon to the application and is written in such a way that is mostly transparent to the users. Each change to a record in the database is translated into the correct json file on disk. Users can commit and push at their leisure. Users can also switch branches / reset to an earlier version, and request the database to be updated from the json files.
I had thought many times about open sourcing the basic part of this addon. But I was never sure how common this use-case is. Also, it's actually pretty trivial to write the base part. The hard parts were gluing it to the existing application/ORM, and working around the many edge cases and bugs in our existing application.
So, I wonder, how common is this use case (version control of an existing hierarchical SQL database in a user friendly way) in other organizations?
Although you already got my upvote for the use of yer... (I use it all the time and rarely see anyone using it, yagetme).
Like the OP, stores data locally in .json files, and uses a simple MongoDB-inspired API
What's up with requiring "--noautoload"? What does "-r ." do? and the transform invocation starts with the seemingly magical "Json.lns" (where does it come from?) and appears to refer to a file at the root of the filesystem hierarchy?
I'd probably use augtool more often myself, but its UI is seriously inscrutable.
You can get this behavior with a SQL database like mysql that has a text dump format. I"ve used this quite effectively to keep relatively small (but important) databases in a git repo.
I can definitely appreciate the advantages of an export format that is explicitly designed to support this (e.g. outputting each field on its own line, perhaps using an explicit label in front of it) but even the out-of-the-box format can often be quite effective.
With SSDs at 2GBps and over, that’s on par with early server RAM and there were companies who were able to serve pretty significant loads with 2GBps RAM.
I'm writing a little program to make this whole lot easier.
Answers: by default, augtool loads all the lenses and scans the entire default filesystem (typically /etc); `--noautoload` prevents that.
The -r is `--root`; specify the root as the current directory rather than /