Maybe it's just me, but I see the presentation functionality as one of the less used aspects of the OpenOffice family.
Devdocs does something similar, but there you request to download the payload manually, and the data is still browsable online without you having to download all of it. The data is also split in a convenient manner (by programming language/library). In other words, you can download individual parts. The UI also remains available offline, which is pretty cool.
*: without adding an index of your own, at which point it isn't really XML anymore, it's some kind of homebrew XML-based archive format.
Recently, DuckDB team raise similar question on DataLake catalog format. Why not just use SQL database for that ? It's simpler and more efficient as well.
More industrious people have apparently wrapped this up on NPM: https://www.npmjs.com/package/sqlite-wasm-http
1. Plaintext format (JSON or similar) or SQLite dump files versioned by git
2. Some sort of modern local first CRDT thing (Turso, libsql, Electric SQL)
3. Server/Client architecture that can also be run locally
Has anyone had any success in this department?
That's wild!
https://sqlite.org/sessionintro.html
That provides a C level API. If you know Python and want to do some prototyping and exploration then you may find my SQLite wrapper useful as it supports the session extension. This is the example giving a feel for what it is like to use:
BLOB type is limited to 2GiB in size (int32). Depending on your use cases, that might seem high, or not.
People would argue that if you store that much of binary data in a SQLite database, it is not really appropriate. But, application format usually has this requirement to bundle large binary data in one nice file, rather than many files that you need to copy together to make it work.
I actually thought it was kind of cool, because I was able to play with it easily with some SQLite explorer tool (I forget which one) and I could easily look at how the save files actually worked.
I haven't really used SQLite for anything serious [2], but always found the idea of it kind of charming. Maybe I should dust it off and try it again.
[1] https://en.wikipedia.org/wiki/Illumination_Software_Creator by Bryan Lunduke before I realized how much of a pseudo-intellectual dimwit that he is.
[2] At least outside of the "included" database in a few web frameworks.
It really is. One of the experiments we have been doing currently to make bug reporting from Androids easier (and to an extent, reduce user frustration and fatigue) is to store app logs (unstructured) in (an in-memory) SQLite table. It lends very well in to on-device LLMs (like Gemma 3n or Qwen2.5 0.5b), as users can Q&A to know just what the app is doing and why it won't work the way they want it to. On-device LLMs are limited (context length and/or embeddings) and too many writes (in batches of 1000 rows) to the in-memory SQLite table (surprisingly) eats up battery like no tomorrow, so this "chat to know what the app is doing" isn't rolled out to everyone, yet.
1. Enable the secure_delete pragma <https://antonz.org/sqlite-secure-delete/> so that when your user deletes something, the data is actually erased. Otherwise, when a user shares one of your application's files with someone else, the recipient could recover information that the sender thought they had deleted.
2. Enable the options described at <https://www.sqlite.org/security.html#untrusted_sqlite_databa...> under "Untrusted SQLite Database Files" to make it safer to open files from untrusted sources. No one wants to get pwned when they open an email attachment.
3. Be aware that when it comes to handling security vulnerabilities, the SQLite developers consider this use case to be niche ("few real-world applications" open SQLite database files from untrusted sources, they say) and they seem to get annoyed that people run fuzzers against SQLite, even though application file formats should definitely be fuzzed. https://www.sqlite.org/cves.html
They fail to mention any of this on their marketing pages about how you should use SQLite as an application file format.
As a document _exchange_/_interchange_ format, what I prefer for durability is a non-binary format (e.g. XML based).
For local use, I agree SQLite might be much faster than ZIP, and of course the ability to query based on SQL has its own flexibility merits.
$ file concessions-stand-menu-template.pages
concessions-stand-menu-template.pages: Zip archive data, at least v2.0 to extract, compression method=store
$ unzip -l concessions-stand-menu-template.pages
Archive: concessions-stand-menu-template.pages
Length Date Time Name
--------- ---------- ----- ----
58727 05-09-2022 13:27 Data/Artboard 2-26.png
26993 05-09-2022 13:27 Data/Artboard 2-small-27.png
11550 05-10-2022 08:13 Index/Document.iwa
720 05-10-2022 08:13 Index/ViewState.iwa
536 05-09-2022 12:41 Index/CalculationEngine-1686619.iwa
23 07-02-2021 17:48 Index/AnnotationAuthorStorage-1686618.iwa
43891 05-09-2022 12:41 Index/DocumentStylesheet.iwa
229 05-09-2022 13:28 Index/DocumentMetadata.iwa
17895 05-10-2022 08:13 Index/Metadata.iwa
379 05-10-2022 08:13 Metadata/Properties.plist
36 05-09-2022 12:41 Metadata/DocumentIdentifier
268 04-29-2022 22:18 Metadata/BuildVersionHistory.plist
135503 05-10-2022 08:13 preview.jpg
1666 05-10-2022 08:13 preview-micro.jpg
11057 05-10-2022 08:13 preview-web.jpg
--------- -------
309473 15 files
$ unzip concessions-stand-menu-template.pages Index/Document.iwa
extracting: Index/Document.iwa
$ file Index/Document.iwa
Index/Document.iwa: data
$ xxd -l 128
00000000: 001a 2d00 bcae 0170 6408 0112 6008 904e ..-....pd...`..N
00000010: 1203 0100 0518 c90c 2209 0a03 0a01 3010 ........".....0.
00000020: 0118 0122 0701 0b08 2e18 0109 1400 2f05 ..."........../.
00000030: 14f4 a801 0b0a 050a 030f 0111 1003 1800 ................
00000040: 2a27 daf8 66db f866 dcf8 66e6 f768 ddf8 *'..f..f..f..h..
00000050: 66df ef66 def8 66d1 f666 fdf5 66d5 f566 f..f..f..f..f..f
00000060: 8ff8 66df f866 86f9 6612 0408 def8 661a ..f..f..f.....f.
00000070: 0408 fdf5 6622 0408 8ff8 6632 0408 dfef ....f"....f2....
[1] https://github.com/lifthrasiir/angel/commit/50a15e703ef2c1af...
That said, creating a format that can convey rich untrusted data is a hard problem.
I think that's an unfair reading. Sqlite runs fuzzers itself and quickly addresses bugs found by fuzzers externally. There's an entire section in their documentation about their own fuzzers and thanking third party fuzzers, including credit to individual engineers.
https://www.sqlite.org/testing.html
The tone of the CVE docs are because people freak out about CVEs flagged by automated tools when the CVEs are for issues that have no security impact for typical usage of SQLite, or have prerequisites that would already have resulted in some form of compromise.
It’s a common trap to fall into. See also: Ben Carson. Both of them are obviously intelligent and highly skilled in their professional fields. And both have let that convince themselves that they know everything about everything.
Spreadsheets might be a little easier because you can separate out by sheet or even down to a row/column level?
Part of me wants to try it now…
SQLite advises against using a networking file system to avoid potential issues, but you can successfully do it.
As an application format, you don't generally expect people to be editing an ODF file at the same time though, so network locking doesn't really disqualify it for use as a document format.
He advocates breaking the XML into smaller pieces in SQLite. I suppose making each slide a new XML record could make sense. Moving over to spreadsheets, I don't know how ODF does it now, but making each sheet a separate XML could make sense.
Thinking about Write documents, I wonder what a good smaller unit would be. I think one XML per page would be too fine a granularity. You could consider one record per chapter. I doubt one record per paragraph would make sense, but it could be fun to try different ideas.
I used to worry a lot about this but it has never once actually come up for me. 50 megabytes is a pretty extreme example, but even so if you edit this document fewer than several million times it won't matter.
Serializing the object graph all over again can be way faster than mapping into a tabular model. There are JSON serializers that can push multiple gigabytes per second per core. It might even be the case that, once you factor in the SSD controller quirks, the tabular updates could cause more blocks to be written than just dumping a big fat json stream all at once.
Oh hell yes you do. Excel spreadsheets are notorious for people wanting to collaborate on them, and PowerPoint sheets come in close second. It used to be an absolute PITA but at least Office 365 makes the pains bearable.
I've been using chunk sizes of 128 megabytes for my media archive. This seems to be a reasonable tradeoff between range retrieval delay and per object overhead (e.g. s3 put/get cost).
But I do see a problem if you really need to use a sqlite that's compiled with particular non-default options.
Say I design a file format and implement it, and my implementation uses an sqlite library that's compiled with all the right options. Then I evangelize my file format, telling everyone that it's really just an sqlite database and sooo easy to work with.
First thing that happens is that someone writes a neat little utility for working with the files, written in language X, which comes with a handy sqlite3 library. But that library is not compiled with the right options, and boom, you have a vulnerable utility.
There are parts of the SQL engine that are exposed to malicious file manipulation (the schema is stored as SQL DDL text) but that's not arbitrary SQL input.
If you want to highlight an inconsistency, this is way more worrying:
> “All historical vulnerabilities reported against SQLite require at least one of these preconditions: (…) 2. The attacker can submit a maliciously crafted database file to the application that the application will then open and query. Few real-world applications meet either of these preconditions…”
However, most of the rest of the page is speaking of arbitrary SQL input, not purposely broken database files.
A binding can expose those settings. It's not a given a third party utility will use them, but they can.
Should one make a massive transaction that is only committed when saving? It is possible to commit such a transaction to a different file when using Save As?
Or maybe for editing one would need to copy the file to a separate temporary location, constantly commit to that file, and when saving move the temporary file over the original file (this way we aren't losing the resilience against corruption SQLite offers).
Or is there a better way to do this? I don't like storing pending changes into the original file since it kinda goes against how users expect files to work (and could cause them to accidentally leak data).
To this: "Unless you work for Google or FaceBook, just do what works, not what your database professor said you ought to do."
View and triggers can contain arbitrary SQL and can be defined by a malicious database file, though these can be disabled as described on the "Defense Against The Dark Arts" page.
That leaves default column values and indexes on expressions, which can execute a limited subset of SQL. I'd be worried about certain arbitrary SQL input vulnerabilities being reachable this way.
XML was meant for documents so in most cases the sequence of elements is given. But technically if I compose XML myself I can lay it out the way I want and thus can have it sorted too. This means it will be directly searchable without an index: read a bit at the middle, find an element name, see where we are, choose head or tail, repeat.
This would also work as a really crude undo tree
I don't really know if it actually goes against users expectations, Office kinda "saves" stuff for you and stores them as temporary versions anyway, to be presented in case you forgot to save
I disliked him before he went super conservative, but now his YouTube channel boils down to “OMG GUYS LOOK AT HOW WOKE EVERYTHING IS WOKE WOKE WOKE WOKE WOKE PEOPLE ARE HATERS ON ME BECAUSE I SAID SOMETHING THEY DONT LIKE WOKE WOKE!”
It’s typical low effort grifter stuff.
Your better will be measured against different criteria, etc.
Base64:ing the images into strings, like one could do with html, would probably not be ideal for compression. As a matter of fact, text-files as such would not be ideal compression-wise.
So I suppose if binary-format cant be avoided, SQLite would be as good as any other compression format. But without built-in collaboration protocol support, like CRDT, with history truncation (and diverged histories can always fall back to diff) I dont think it'd be good enough to justify the migration.
Also, SQLite did provide good support for read / write the blob in streamable fashion, see: https://www.sqlite.org/c3ref/blob_read.html
So the limitation is really a structural issue that Dr. Hipp at some point might resolve (or not), but pretty much has to be resolved by SQLite core team, not outside contributors (of course you can resolve it by forking, but...).
The CVE docs:
> The attacker can submit a maliciously crafted database file to the application that the application will then open and query
This is exactly the normal use case GP talks about with application file formats.
Can anyone expand on this? Why would it be better than a binary format?
I was watching a talk Andrew Kelley gave about a simple binary format he’s using in Zig: https://www.hytradboi.com/2025/05c72e39-c07e-41bc-ac40-85e83...
Having to map between SQLite and the application language seems like it’d add lots of complexity, but I don’t have any experience with custom file formats so would love some advice.
I do get a little tired of the woke stuff, but a Youtuber has to follow a specific pattern to get traffic. It's an important message. I'm sure he takes it at least a little personally that he is banned from forums, conferences, talking to various companies about their activities, has his technical achievements (see: the top comment I replied to here, and his awful treatment by OpenSUSE folks), ignored due to irrelevant (and popular) political views, antagonized for being Jewish, etc. He wants to be a tech journalist but he is persecuted over politics. So if he complains about it a lot, I expect that and appreciate him taking the heat for saying what we all think.
Yes. This is why I called it a low effort grift.
The anti-woke stuff was overplayed in 2016, and it's even more tiring and stupid now. You're free to think it's "important", but it's not. It's just lazy shit he does instead of actual "journalism" (which I suppose is what he calls it).
> I'm sure he takes it at least a little personally that he is banned from forums, conferences, talking to various companies about their activities, has his technical achievements
> He wants to be a tech journalist but he is persecuted over politics.
He's not "persecuted" over politics. He's putting his opinions out there specifically to get a reaction, and then he pretends to be surprised that people actually react to his opinion. You could say it's persecution, but it's really not: everyone draws a line on this stuff.
For example: if someone was super public about lowering the age of consent to three years old then you probably wouldn't be super upset when he's no longer invited to conferences. That could technically be considered a "political opinion" and I'm sure that he would claim he's being persecuted and we would collectively roll our eyes.
Obviously Lunduke isn't that bad, at least as far as I know, but my point is that he's making provocative statements and unless he's the biggest moron on the planet then he has to know that.
It's something that bothers me; people like Lunduke will write shit specifically to be provocative (like writing a completely braindead thing about trans people not existing) and get a reaction. That is his goal. Then he acts surprised that people react negatively to the thing that he wanted and expected people to act negatively to. It's low-effort attention-seeking behavior.
I have said lots of provocative stuff throughout my life, it can be fun to make people uncomfortable. Some of it I am a bit embarrassed by, but I haven't made an entire career out of making people upset and pretending to not understand why they're upset.
> ignored due to irrelevant (and popular) political views
A viewpoint being "popular" pays no bearing on whether or not it's harmful so I have no idea why you brought it up.
If Lunduke has posted a very public negative opinion about a group of people that are active in a community (e.g. trans people in the FOSS world), then it's not "irrelevant" for people to not want to affiliate with him.
> I expect that and appreciate him taking the heat for saying what we all think.
We don't "all" think that. Pretending to be upset over a trans person working on software or purposefully misgendering people is not something I have ever really wanted to say, and even if I did I would just fucking say it instead of parasocially bonding with some wish.com wannabe demagogue.
Also, I'm not even completely convinced by his "achievements". I'm sure he worked at Microsoft and OpenSUSE, but that's not saying much. I used to work for Apple for several years. I didn't work there but I did at one point get an offer to work at Canonical. I don't want to give too much correlation data about myself but I have also worked at an extremely popular social media website. A lot of people on this forum can make similar claims. It doesn't make me or him particularly special.
Big tech companies hire a lot of very stupid people. They hire a lot of very smart people too, but even if he worked at Microsoft, even in the 90's, isn't an indication of intelligence or making major achievements, and frankly I kind of get the impression that he embellishes his achievements to try and make himself seem more credible, though I have no evidence of that.
Also, wasn't he basically just a spokesperson for OpenSUSE? I didn't think he was doing anything technical there.
ETA:
This is all to say, it's not like Lunduke has been cut out of conferences and the like just for voting republican or anything. I've met plenty of people in tech who are conservative, don't try to hide that fact, and they're not shunned or anything, so I don't buy the conspiracy theory that conservative voices are "persecuted" in tech spaces.
Lunduke goes a step further by being outwardly hostile towards LGBTQ groups, and then pretends he's not doing that. This is why he's been considered so unbelievably insufferable in the tech world: his entire way of speaking is dishonest.
This is exactly the OTHER way around. Most usages of SQLite are as an application file format. Firefox stores bookmarks, history, cookies in SQLite files in the profiles folder. Messaging apps (WhatsApp, Signal, etc. use SQLite for chat history). macOS and Windows use SQLite in various subsystems, ex: Spotlight metadata, application cache. Mobile apps use SQLite heavily. And probably ten thousand other cases as a file format if I bother to look up more.
When I think application file format I think of something like .txt, pdf, or .doc, where it's expected that you'll receive untrusted input passed around. In that case it makes a lot more sense to restrict which features of SQLite are accessible, and even then I'd worry about using it in widely - there's so much surface area, plus the user confusion of shm and wal files.
The woke stuff directly affects my job prospects and quality of life. That makes it important. I've been suffering because of it every since 2012. It escalated between 2016 and 2024, and only now is the pendulum swinging the other way.
>We don't "all" think that. Pretending to be upset over a trans person working on software or purposefully misgendering people is not something I have ever really wanted to say, and even if I did I would just fucking say it instead of parasocially bonding with some wish.com wannabe demagogue.
He finds much more lurid stories to report than "mere" misgendering nonsense. The pronoun thing is just a litmus test to see whether someone is in a woke cult or not. He reports on actual interesting stories, like lawsuits, new software, outsourcing, layoffs, drama in various communities, etc. (some of which occasionally involves pronouns, yes, but it's no exaggeration to say that these woke people want you banished or even dead if you refuse to go with their delusions).
>Big tech companies hire a lot of very stupid people.
It happens. I don't think Lunduke is stupid though. He is in PR more than pure tech. That doesn't make him stupid. Neither does him being conservative.
>I don't buy the conspiracy theory that conservative voices are "persecuted" in tech spaces.
It's not a theory. He regularly reports on awful treatment of conservatives. You'd be surprised at how malicious some of these woke people are. People have been banned from conferences for being seen on Twitter wearing a MAGA hat. They have been fired for being lukewarm about woke shit. I don't blame you for being out of touch, since only Lunduke seems to be willing to report the stuff, and you refuse to watch. But closing yourself off to all evidence against your views and saying "No you guys are just imagining it!" is the actually dishonest take.
>Lunduke goes a step further by being outwardly hostile towards LGBTQ groups, and then pretends he's not doing that. This is why he's been considered so unbelievably insufferable in the tech world: his entire way of speaking is dishonest.
He is not speaking to them or about them in the way they demand, you mean. People have a right to simply refuse to engage in the constant celebration of certain lifestyles and worldviews. Going to work should not require being lectured about how awesome it is for people to engage in abnormal sexual behaviors, or celebration and advancement of people based on their race or sex alone. Liberals have no problem demanding such things on a constant basis, ostracizing and seeking to banish anyone who disagrees even 5%, and that is exactly why we need people like Lunduke to bravely issue scathing critiques of these practices. Besides that, his tech news is kind of interesting and unique, and he has a good sense of humor.