Show HN: Embed an SQLite database in your PostgreSQL table

(github.com)

pglite-fusion is a PostgreSQL extension that allows you to embed SQLite databases into your PostgreSQL tables by enabling the creation of columns with the `SQLITE` type. This means every row in the table can have an embedded SQLite database.

In addition to the PostgreSQL `SQLITE` type, pglite-fusion provides the `query_sqlite`` function for querying SQLite databases and the `execute_sqlite` function for updating them. Additional functions are listed in the project’s README.

The pglite-fusion extension is written in Rust using the pgrx framework [1].

----

Implementation Details

The PostgreSQL `SQLITE` type is stored as a CBOR-encoded `Vec<u8>`. When a query is made, this `Vec<u8>` is written to a random file in the `/tmp` directory. SQLite then loads the file, performs the query, and returns the result as a table containing a single row with an array of JSON-encoded values.

The `execute_sqlite` function follows a similar process. However, instead of returning query results, it returns the contents of the SQLite file (stored in `/tmp`) as a new `SQLITE` instance.

[1] https://github.com/pgcentralfoundation/pgrx

Show context

robertclaus ◴[19 Nov 24 13:36 UTC] No.42183246[source]▶

>>42182146 (OP) #

What are the use cases for this? I can't imagine designing a database schemas to use this in a typical product. Is it intended for hybrid applications to back up local user data directly with their account info?

replies(4): >>42183347 #>>42183361 #>>42183493 #>>42185818 #

1. simonw ◴[19 Nov 24 13:47 UTC] No.42183361[source]▶

>>42183246 #

I can think of plenty.

The most interesting one for me is if you're running a SaaS product like Notion where your users create custom applications that manage their own small schema-based data tables.

Letting users create full custom PostgreSQL tables can get complex - do you want to manage tens of thousands of weird custom tables in a PostgreSQL schema somewhere?

I'd much rather manage tens of thousands of rows in a table where one of the columns is a BLOB with a little SQLite database in it.

replies(2): >>42183779 #>>42186863 #

2. munk-a ◴[19 Nov 24 14:27 UTC] No.42183779[source]▶

>>42183361 (TP) #

> Letting users create full custom PostgreSQL tables can get complex - do you want to manage tens of thousands of weird custom tables in a PostgreSQL schema somewhere?

Yea, I'd be fine with that - postgres has the concept of databases and schemas within those databases. If you really want to build a product like that I'd suggest starting with per-tenant schemas that leverage table inheritance as appropriate. The permissions would be pretty easy to manage.

Though, in a lot of cases I've actually seen this done every client ends up with a dedicated server (or container - whatever tech you use to do it, something completely isolated from other instances) because user version management ends up being a huge issue. When you're building something that custom it's highly likely that version migrations need to be done with client oversight to ensure everything actually works.

I have yet to find an actual real world case where the inner-platform effect is the right solution. Usually when tools like that are selected the software ends up being so generic and flexible that's it's useless. Custom application/BI environment development relies on really judiciously telling users they can't have most features - with the hard part being figuring out which features are necessary and which ones you can cut to reduce bloat.

replies(1): >>42184217 #

3. jitl ◴[19 Nov 24 15:09 UTC] No.42184217[source]▶

>>42183779 #

Notion has 100 million users, managing schema-per-tenant at our scale sounds like a complexity nightmare. We have 480+ identical schemas across 100+ Postgres hosts, and that already takes a lot of brainpower & engineering time to manage T_T

replies(1): >>42185066 #

4. bob1029 ◴[19 Nov 24 16:14 UTC] No.42185066{3}[source]▶

>>42184217 #

> managing schema-per-tenant at our scale sounds like a complexity nightmare.

The per-tenant schema could be the tenant's responsibility. Most non-technical users can handle the idea of tables & columns, assuming you leverage UI/UX patterns they are already familiar with.

replies(1): >>42185386 #

5. jitl ◴[19 Nov 24 16:40 UTC] No.42185386{4}[source]▶

>>42185066 #

Our UI looks like a table: https://www.notion.so/help/intro-to-databases

As long as we never add new features, never need to change how we map UI <-> Postgres DDL, and our users never make any mistakes when they change their tables, it could work without being a complexity nightmare

replies(1): >>42186593 #

6. mbesto ◴[19 Nov 24 18:24 UTC] No.42186593{5}[source]▶

>>42185386 #

Curious - so how do you manage client-specific schemas then? Do you just have mappings in postgres (column1, column2, column3, etc.) or maybe store a client specific schema in bson per client?

replies(1): >>42189493 #

7. mediaman ◴[19 Nov 24 18:53 UTC] No.42186863[source]▶

>>42183361 (TP) #

Why not use jsonb for this kind of thing? Store the schema in one table, one per client, or perhaps one per table per client, and then store the data for that in another table, segregated by customer and table type, with row data stored in a JSONB field using that table's schema.

I normally don't like using JSONB when I could have a rigorous schema, but this sort of application seems reasonable.

8. jitl ◴[20 Nov 24 00:14 UTC] No.42189493{6}[source]▶

>>42186593 #

It's all JSON in two Postgres tables: `collection` which represents a Notion Database and has a `collection.schema` JSONB column, and `block` which has a `block.properties` JSONB column that stores the property values - the stuff in the Notion Database cells - for each row. We apply "schema on read" when querying or rendering a Notion Database, and we have a service on the backend that builds indexes/caches for hot collections on the fly. The service handles all the queries for collections larger than X rows. For smaller collections, we just give the client the whole thing modulo permissions and it does the query locally.

replies(1): >>42217123 #

9. mbesto ◴[22 Nov 24 20:25 UTC] No.42217123{7}[source]▶

>>42189493 #

Thanks! Super helpful.

↑