Most active commenters
  • hinkley(5)
  • jchw(3)

←back to thread

SQLite File Format Viewer

(sqlite-internal.pages.dev)
272 points ilumanty | 16 comments | | HN request time: 0.769s | source | bottom
1. jchw ◴[] No.43684392[source]
This really would've come in handy when I was debugging my own SQLite parser a couple weeks ago.

One thing that initially confused me was how exactly the pages worked w.r.t. the first page on disk... I misunderstood the SQLite documentation in different ways, but it's really rather simple: the very first page is just treated as containing the file header in it, and it pushes down the rest of the data, making the page shorter than the other pages. You can see that illustrated clearly if you click into the first page of a database using this tool: the database header comes first, then the page header.

This tool will undoubtedly come in handy for anyone who has a reason to be dealing with SQLite data structures directly for whatever reason, especially since the SQLite documentation is a bit terse at times.

replies(3): >>43684613 #>>43688207 #>>43692316 #
2. hinkley ◴[] No.43684613[source]
I really want a data format that is effectively binary JSON. What is the subset of all of the features of SQLite that makes either a read-only or an updatable data set that is compact. But better searchability than a streaming parser.
replies(6): >>43684710 #>>43685626 #>>43685952 #>>43687507 #>>43688023 #>>43690668 #
3. jchw ◴[] No.43684710[source]
If you want to maintain the properties that SQLite has for read use cases, you'll need to replicate a couple of features. At the very least, you'll probably want the format to still be page-based with a BTree structure. You really could get away with just using the SQLite format if you didn't mind the weirdness; a functional SQLite parser that can read tables would not be a significant amount of code. I think, though, that if you want to read the schema as SQLite understands it, you'd need to interpret the CREATE TABLE syntax, which would make it a bit more complex for sure. Otherwise, you can read tables and columns themselves relatively easily, and the values are all stringified.
replies(1): >>43685643 #
4. Retr0id ◴[] No.43685626[source]
sqlite itself supports a binary encoding of JSON: https://sqlite.org/jsonb.html
replies(1): >>43685896 #
5. hinkley ◴[] No.43685643{3}[source]
Yeah if I wasn’t clear I’m talking about a minimal file that SQLite can still open read only without errors, not a third party implementation. Though there might be a few tweaks that would allow SQLite to be a bit more lenient. For instance missing metadata that can be assumed. Maybe b tree nodes exceeding the usual load factor.
6. hinkley ◴[] No.43685896{3}[source]
When I said binary JSON I didn’t mean literal JSON. I meant “common denominator interchange format”. It’s too chatty by far and has dismal performance for queries. So you’re better off asking a specific question and getting a larger document that could answer many questions that you do t yet have. For CDNs things like this matter a lot.
replies(1): >>43689093 #
7. cwmma ◴[] No.43685952[source]
Parquet or some other column oriented data format is probably closest to what you want without getting into indexing your flat files or similar
8. w10-1 ◴[] No.43687507[source]
MongoDB's BSON?
replies(1): >>43697578 #
9. 79a6ed87 ◴[] No.43688023[source]
Have you tried MessagePack[0]?

0: https://msgpack.org/index.html

replies(1): >>43697567 #
10. burntcaramel ◴[] No.43688207[source]
Yes I’m also working on a SQLite parser, mine is in raw WebAssembly. Is yours open source too? This tool will be so useful. I have basic page reading and parsing of the CREATE TABLE schema: https://github.com/RoyalIcing/SilverOrb/blob/9dacad0ce521b0d...

My plan is to create a miniature .wasm module to read .sqlite files that works in the browser. It will be in the tens of kilobytes rather than the 1 megabyte that the official fantastic sqlite.wasm is. The reduced download means even on 3G you ought to be able to load within a few seconds. You can use SQLite files as your network payloads then, and perhaps even as the working mutable state synced between server and clients.

replies(1): >>43688346 #
11. jchw ◴[] No.43688346[source]
Mine is in TypeScript and for the purpose of parsing a file that happens to use sqlite, so I don't think I'll bother parsing the CREATE TABLE schema unless I have to. It's not currently posted anywhere but will be open source. It works, but the code isn't particularly great :)
12. ◴[] No.43689093{4}[source]
13. ◴[] No.43690668[source]
14. invisal ◴[] No.43692316[source]
Glad you like it.
15. hinkley ◴[] No.43697567{3}[source]
I would probably just use bson or gRPC. As o clarified elsewhere, I means JSON as an analogy. I want something that can be scanned and queried cheaply.
16. hinkley ◴[] No.43697578{3}[source]
Mongo sits on a throne of lies and I will never condone anyone using it for any purpose except to make a joke.