Show HN: Fireproof – local-first database with Git-like encrypted sync

(fireproof.storage)

99 points jchanimal | 2 comments | 19 Nov 24 15:19 UTC | HN request time: 0.406s | source

Hi, HN! As a cofounder of Couchbase, I pioneered mobile sync, and I’ve always wanted to bring the speed and reliability of local-first data to the web, incubating PouchDB among other efforts. I learned the constraints of real world financial applications at McKinsey & Company FinLab, and Merkle integrity research at Protocol Labs taught me smart contract data structures. As part of the JavaScript community (and early hosting provider for NPM) I’ve been waiting, and now with the availability of APIs like Passkeys and Origin Private Filesystem, I’m happy to say the browser is ready to support embedded databases.

Front-ends are a lot easier to write when your database handles live sync for you, but the existing solutions rely on heavyweight cloud APIs instead of putting the smarts at the edge, where it belongs. I started from a different set of constraints, and arrived at a lightweight embedded database that uses a git-like data model to offer cryptographic causal consistency across browsers, edge functions, and anywhere TypeScript runs.

It’s designed to make building full-featured apps as simple as calling `db.put({ hello: "world" })` and syncing them as easy as calling `connect(db, remote)`. People are using Fireproof for AI character chat[1], personal finance[2], and hedge funds[3], and we aim to be simple enough for novice coders to build enterprise-critical apps. Fireproof makes product owners dangerous, because just a little bit of code can define an application’s workflow and data model. See the code sample below.

The reactive APIs[4] are designed for live collaboration so your user interfaces update automatically, making it an easy way to add query collaboration to legacy dashboards, or write new interactive tools for your team. Merkle CRDTs[5] provide multi-writer safety while maintaining tamperproof data provenance, conflict tracking, and deterministic merges. The storage engine writes content-addressed encrypted files that can be synced via commodity backends like S3 or Cloudflare[], without sacrificing data integrity.

Our contributors include legends like Damien Katz, Meno Abels, Mikeal Rogers, and Alan Shaw. Fireproof is open source (Apache/MIT) and we know there are rough edges, so we hope this post stirs up collaborators![6] Please `npm install @fireproof/core` and give us feedback[7]. We are on the stable side of beta, so it’s a great time for the adventurous to join. I’m excited to see all the apps people write now that it’s easy!

[1] https://github.com/fireproof-storage/catbot/tree/main

[2] https://fireproof.storage/posts/quickcheck:-print-checks-at-...

[3] https://fireproof.storage/posts/contributor-spotlight:-danie...

[4] https://use-fireproof.com/docs/react-tutorial

[5] https://fireproof.storage/posts/remote-access-crdt-wrapped-m...

[6] https://github.com/fireproof-storage/fireproof/issues

[7] https://discord.gg/DbSXGqvxFc

Show context

bosky101 ◴[20 Nov 24 03:47 UTC] No.42190661[source]▶

>>42184362 (OP) #

The website and example, and usage looks clean. Kudos! I have some questions around what's happening under the hood that werent evident from an initial read of both your website as well as GitHub.

1. Does subscribe listen for new changes on a transient server(just a queue). Or from a more persistent store?

2. Where do the events persist? I didn't see a connector to postgres. I did see one for s3.

3. What is the default persistence layer you are advocating?

4. Let's say you run 3 instances of the self hosted server. And a random one of them gets a teacher. And 2 random other students gets load balanced to two other servers. How does the teacher get all messages? What's the thread in a distributed setting

5. How do you filter only messages. Eg: only since time T.

6. Pagination / limits to avoid any avalanche?

7. Auth? Custom auth/jwt?

8. REST API to produce?

9. Are consumers restricted to browsers? What about one in node?

10. BONUS: Have you tested if this works embedded as an iframe or embedded in an native/react native mobile app?

replies(2): >>42195875 #>>42196225 #

jchanimal ◴[20 Nov 24 16:58 UTC] No.42195875[source]▶

>>42190661 #

These are awesome questions, I'll try to fold the answers into the docs also.

1. The embedded database subscribes to the remote sync endpoint when it is connected. This subscription might be polling, websocket, or anything else. The local embedded database will try to keep up with changes anyone pushes to the remote endpoint. This is more a backend mechanical thing than an API you'll see.

Your code can subscribe to the local database -- this is a JavaScript event loop, and any updates, local or remote, will cause your callback to run. The upshot is all you have to do is connect your database to the sync endpoint and it will stay up to date, and you can also connect your UI to the database via `db.subscribe()`

2. Updates are written to local storage (indexed db or the filesystem) as encrypted blobs. These are then replicated to the cloud (without being parsed by the cloud). We have SQL connectors also, but we haven't done the Postgres specific stuff (just started designing it). That is the data side. There is also the clock register, which the client updates to point to the most recent blob. This register is multi-writer safe, and can occasionally point to more than one "head" blob, in which case the client does the deterministic merge on read.

3. In my experience most people use the defaults, so we have Fireproof Cloud which uses R2 and durable objects. We also have a SAM template for AWS, and a connector for Netlify, in addition things that are more like parts for building your own backend (file and http endpoints).

4. Each ledger replicates 100% when it syncs, so all hosts have the same data (no sharding within a ledger.) Typically you have one centralized endpoint to sync via. (p2p is possible but you'd end up contributing some plumbing to the project I bet). So in this case the class would have a URL that is the sync point, and everyone would pull from it periodically or via streaming.

Merges are idempotent, deterministic, associative, and commutative, so it doesn't matter what order the teacher and students apply updates to their local instance, once all updates are applied, they have the same state.

5. The e2e encryption means you'd have to give the keys to the server to allow it to create subsets for sync, so we haven't done that yet. Our next optimization is to sync the readonly current dataset first, then any extra data needed for writing, and only when necessary, the historical log. This still doesn't solve the subset sync issue, but will benefit all use cases immediately.

There is some cool research we might use for subset sync: https://g-trees.github.io/g_trees/

But more practical is probably to finish the Postgres backend and then build subsetting at the global (multi-ledger) dataset level.

replies(1): >>42210695 #

1. bosky101 ◴[22 Nov 24 02:44 UTC] No.42210695[source]▶

>>42195875 #

wrt (3) Being able to self-host is extremely important. I noticed a lot of focus on the docs on the Quickstart/client usage. But things like default storage engine as a ENV, path for storage as an ENV. These are very important.

hmm. Replicate state to all clients. Ok.

Seems like an opinionated but well thought through project. Godspeed!

replies(1): >>42211446 #

2. jchanimal ◴[22 Nov 24 05:38 UTC] No.42211446[source]▶

>>42210695 (TP) #

Thanks, and thanks for the encouragement to fully document the gateway interface. We have been flux-ing it lately but as soon as it settles down we’ll do that.

The vision is many small ledgers, so the full replication per ledger makes sense, but we have work to do on cross-ledger queries

↑