←back to thread

Show HN: Embed an SQLite database in your PostgreSQL table

(github.com)

180 points frectonz | 1 comments | 19 Nov 24 11:02 UTC | HN request time: 0.205s | source

pglite-fusion is a PostgreSQL extension that allows you to embed SQLite databases into your PostgreSQL tables by enabling the creation of columns with the `SQLITE` type. This means every row in the table can have an embedded SQLite database.

In addition to the PostgreSQL `SQLITE` type, pglite-fusion provides the `query_sqlite`` function for querying SQLite databases and the `execute_sqlite` function for updating them. Additional functions are listed in the project’s README.

The pglite-fusion extension is written in Rust using the pgrx framework [1].

----

Implementation Details

The PostgreSQL `SQLITE` type is stored as a CBOR-encoded `Vec<u8>`. When a query is made, this `Vec<u8>` is written to a random file in the `/tmp` directory. SQLite then loads the file, performs the query, and returns the result as a table containing a single row with an array of JSON-encoded values.

The `execute_sqlite` function follows a similar process. However, instead of returning query results, it returns the contents of the SQLite file (stored in `/tmp`) as a new `SQLITE` instance.

[1] https://github.com/pgcentralfoundation/pgrx

Show context

bitwize ◴[19 Nov 24 15:34 UTC] No.42184606[source]▶

>>42182146 (OP) #

Yo, dawg, I heard you like databases...

This is nuts. I can't think of a use for it, but I'm sure it's "a solution that will eventually find a problem".

replies(2): >>42186396 #>>42186603 #

1. unregistereddev ◴[19 Nov 24 18:25 UTC] No.42186603[source]▶

A different approach:

I had a project that stored a tremendous amount of spatial data. There were "sessions" of spatially-tagged time-series data that would be individually processed (think generating a map layer from time-series data). There were also reasons to perform higher level aggregations that did not dive into the time series data. The data density was high enough that it was impractical to build spatial indices over the entire dataset. Even using space-filling curves as multidimensional B-trees would require so many lookups that queries were impractically slow.

One POC I tried (and then rejected as an abomination) was to store each session's time-series data inside a SQLite database with SpatialLite extensions enabled. Then store each session's metadata, including spatial extent, in a Postgres database. The SQLite files were tossed in S3 and referenced from Postgres. I guess I could have inserted them directly to a BLOB column inside Postgres.