Most active commenters

frectonz(6)
TekMol(6)
michelpp(5)
sgarland(5)
(4)
Tostino(4)
klysm(4)
rswail(3)
prisenco(3)
jitl(3)

Popular/hot comments

>>42183907 #
>>42183560 #
>>42183246 #
>>42183327 #
>>42184372 #
>>42183347 #

Show HN: Embed an SQLite database in your PostgreSQL table

(github.com)

pglite-fusion is a PostgreSQL extension that allows you to embed SQLite databases into your PostgreSQL tables by enabling the creation of columns with the `SQLITE` type. This means every row in the table can have an embedded SQLite database.

In addition to the PostgreSQL `SQLITE` type, pglite-fusion provides the `query_sqlite`` function for querying SQLite databases and the `execute_sqlite` function for updating them. Additional functions are listed in the project’s README.

The pglite-fusion extension is written in Rust using the pgrx framework [1].

----

Implementation Details

The PostgreSQL `SQLITE` type is stored as a CBOR-encoded `Vec<u8>`. When a query is made, this `Vec<u8>` is written to a random file in the `/tmp` directory. SQLite then loads the file, performs the query, and returns the result as a table containing a single row with an array of JSON-encoded values.

The `execute_sqlite` function follows a similar process. However, instead of returning query results, it returns the contents of the SQLite file (stored in `/tmp`) as a new `SQLITE` instance.

[1] https://github.com/pgcentralfoundation/pgrx

1. robertclaus ◴[19 Nov 24 13:36 UTC] No.42183246[source]▶

>>42182146 (OP) #

What are the use cases for this? I can't imagine designing a database schemas to use this in a typical product. Is it intended for hybrid applications to back up local user data directly with their account info?

replies(4): >>42183347 #>>42183361 #>>42183493 #>>42185818 #

2. StayTrue ◴[19 Nov 24 13:43 UTC] No.42183327[source]▶

>>42182146 (OP) #

Yo dawg I heard you liked databases so we put a database in your database.

replies(4): >>42183392 #>>42183505 #>>42183891 #>>42185759 #

3. simonw ◴[19 Nov 24 13:44 UTC] No.42183335[source]▶

>>42182146 (OP) #

They /tmp file mechanism sounds like a bit of a hack, is that definitely necessary?

It may be possible to create a SQLite in-memory database instead and then load the binary blob data into it using the backup API or some kind of trick with VACUUM INTO.

replies(2): >>42183582 #>>42184303 #

4. bravura ◴[19 Nov 24 13:45 UTC] No.42183347[source]▶

>>42183246 #

The top line of the README says: "Embed an SQLite database in your PostgreSQL table. AKA multitenancy has been solved."

But I'm still having trouble trying to grok the intricacies of it. In a sense, I guess it has well isolated individual SQLite DBs and you'd have to go out of your way to join over them. With that said, does PostgreSQL manage and pool all the writes correctly? So you don't need to worry about SQLite concurrency issues?

replies(3): >>42183542 #>>42183581 #>>42183609 #

5. zekenie ◴[19 Nov 24 13:46 UTC] No.42183353[source]▶

>>42182146 (OP) #

I’m trying to think through when I’d reach for this over jsonb… I guess the fact that there’s an enforced schema? And that you could do aggregations on your SQLite db? Or maybe if you wanted to send the whole delete db to a client??

replies(1): >>42183675 #

6. simonw ◴[19 Nov 24 13:47 UTC] No.42183361[source]▶

>>42183246 #

I can think of plenty.

The most interesting one for me is if you're running a SaaS product like Notion where your users create custom applications that manage their own small schema-based data tables.

Letting users create full custom PostgreSQL tables can get complex - do you want to manage tens of thousands of weird custom tables in a PostgreSQL schema somewhere?

I'd much rather manage tens of thousands of rows in a table where one of the columns is a BLOB with a little SQLite database in it.

replies(2): >>42183779 #>>42186863 #

7. kramer2718 ◴[19 Nov 24 13:49 UTC] No.42183392[source]▶

>>42183327 #

(laughs in xzibit)

8. rswail ◴[19 Nov 24 14:00 UTC] No.42183493[source]▶

>>42183246 #

You manage a fleet of devices that need to get operating parameters regularly, but they're complicated and SQLite is a great mechanism for sending that.

So at the backend you have a postgres database that contains the device details etc as well as the operating parameters for that device.

You can update the operating parameters as part of a postgres transaction so either all the BLOBs are updated, or none.

Using /tmp on the postgres cluster (server) is a bit of a hack, it would be nicer to have memory based SQLite blobs.

In terms of security, you get Postgres row level security, so each SQLite value is protected in the same way as the rest of the row.

9. ◴[19 Nov 24 14:01 UTC] No.42183505[source]▶

>>42183327 #

10. rswail ◴[19 Nov 24 14:04 UTC] No.42183542{3}[source]▶

>>42183347 #

Each of the columns is an instance of a SQLite database, so I assume (without looking at the source) that they properly multi-thread as needed.

So there's not cross-SQLite-database connections or multiple writers going on.

11. aerzen ◴[19 Nov 24 14:06 UTC] No.42183560[source]▶

>>42182146 (OP) #

Ok, hear me out: what if we make something that takes a postgres database dir, tars it together and encodes it as a binary blob in SQLite?

We could have SQLite within postgres within sqlite within postgres! Is it practical or even slightly useful? Of course not - but it's SQL databases all the way down. Not that this is a good thing in itself.

replies(7): >>42183593 #>>42183707 #>>42183733 #>>42185175 #>>42186156 #>>42186434 #>>42186712 #

12. rswail ◴[19 Nov 24 14:07 UTC] No.42183581{3}[source]▶

>>42183347 #

You could join over them, but not really in the way you're thinking.

Each of the columns that are databases would be updated when the functions execute.

You could do weird crap like INSERT/DELETE as part of a postgres level SELECT.

replies(1): >>42186004 #

13. michelpp ◴[19 Nov 24 14:08 UTC] No.42183582[source]▶

>>42183335 #

I think the right approach would be to store the sqlite database as a varlena type that can be TOASTed and then "expanded" using the Expanded Datum API so that it's a live open database connection for the life of the transaction:

https://www.postgresql.org/docs/17/xtypes.html#XTYPES-TOAST

https://github.com/postgres/postgres/blob/master/src/include...

replies(1): >>42183615 #

14. frectonz ◴[19 Nov 24 14:09 UTC] No.42183593[source]▶

>>42183560 #

My next project

15. pmontra ◴[19 Nov 24 14:11 UTC] No.42183609{3}[source]▶

>>42183347 #

If by solving multinenancy they mean

  CREATE TABLE tenants (
    id     BIGINT NOT NULL,
    database SQLITE DEFAULT execute_sqlite(
        empty_sqlite(),
        'CREATE TABLE users (etc.)'
        and all the other tables
        for each tenant
    )
  );

then they don't need to make joins between sqlite dbs.

Your other concerns are very real. Those sqlite dbs could become very large. I prefer the use case depicted in another reply: preparing sqlite dbs before shipping them to their own devices. Or maybe receiving them and performing analysis, maybe after having imported it in overall psql tables. Or similar scenarios in which all the db is read or written at once. Anyway, once we have a tool we start using it.

replies(1): >>42185147 #

16. frectonz ◴[19 Nov 24 14:12 UTC] No.42183615{3}[source]▶

>>42183582 #

Thanks i will look into this more, /tmp stuff is most definitely a hack.

replies(1): >>42186062 #

17. sgarland ◴[19 Nov 24 14:13 UTC] No.42183625[source]▶

>>42182146 (OP) #

> Most relational database management systems do not support nested records, so tables are in first normal form by default. In particular, SQL does not have any facilities for creating or exploiting nested tables. [0]

“Not with that attitude.”

– frectonz

[0]: https://en.wikipedia.org/wiki/First_normal_form

replies(1): >>42191457 #

18. sgarland ◴[19 Nov 24 14:17 UTC] No.42183675[source]▶

>>42183353 #

> enforced schema

I have bad news for you [0] about SQLite’s view on schema consistency.

[0]: https://www.sqlite.org/quirks.html

replies(1): >>42186047 #

19. bni ◴[19 Nov 24 14:17 UTC] No.42183685[source]▶

>>42182146 (OP) #

1NF crime against humanity?

20. c0balt ◴[19 Nov 24 14:20 UTC] No.42183707[source]▶

>>42183560 #

Take it one step further, the table-oriented database(tm) , embed clickhouse, MongoDB, Redis and PostgreSQL to ensure you have more flexibility than anyone can utilize efficiently. The one database to rule them all.

replies(1): >>42188798 #

21. imhoguy ◴[19 Nov 24 14:23 UTC] No.42183733[source]▶

>>42183560 #

No need for `tar`, there is "SQLite Archive" https://www.sqlite.org/sqlar.html

22. munk-a ◴[19 Nov 24 14:27 UTC] No.42183779{3}[source]▶

>>42183361 #

> Letting users create full custom PostgreSQL tables can get complex - do you want to manage tens of thousands of weird custom tables in a PostgreSQL schema somewhere?

Yea, I'd be fine with that - postgres has the concept of databases and schemas within those databases. If you really want to build a product like that I'd suggest starting with per-tenant schemas that leverage table inheritance as appropriate. The permissions would be pretty easy to manage.

Though, in a lot of cases I've actually seen this done every client ends up with a dedicated server (or container - whatever tech you use to do it, something completely isolated from other instances) because user version management ends up being a huge issue. When you're building something that custom it's highly likely that version migrations need to be done with client oversight to ensure everything actually works.

I have yet to find an actual real world case where the inner-platform effect is the right solution. Usually when tools like that are selected the software ends up being so generic and flexible that's it's useless. Custom application/BI environment development relies on really judiciously telling users they can't have most features - with the hard part being figuring out which features are necessary and which ones you can cut to reduce bloat.

replies(1): >>42184217 #

23. Apreche ◴[19 Nov 24 14:30 UTC] No.42183817[source]▶

>>42182146 (OP) #

That’s fun, but I think I'll just use an SQLite foreign data wrapper instead.

replies(2): >>42183865 #>>42184033 #

24. frectonz ◴[19 Nov 24 14:35 UTC] No.42183865[source]▶

>>42183817 #

That's the sensible option but not the fun one.

25. mtharrison ◴[19 Nov 24 14:38 UTC] No.42183891[source]▶

>>42183327 #

Came for this

26. TekMol ◴[19 Nov 24 14:39 UTC] No.42183907[source]▶

>>42182146 (OP) #

Are there still reasons to use PostgreSQL?

I like the simplicity of SQLite's "a file is all you need" approach so much, that I started to converge all my projects to SQLite. So far, I have not come across any roadblocks.

Can anyone think of a use case where PostgreSQL is better suited than SQLite?

replies(8): >>42183960 #>>42183962 #>>42183971 #>>42183990 #>>42184022 #>>42184026 #>>42184221 #>>42184299 #

27. randomdata ◴[19 Nov 24 14:45 UTC] No.42183960[source]▶

>>42183907 #

Certainly if you need a network-attached database and aren't creating your own home brew network-attached database (the so-called API server), Postgres is a pretty good choice.

28. notRobot ◴[19 Nov 24 14:45 UTC] No.42183962[source]▶

>>42183907 #

https://www.sqlite.org/whentouse.html

29. buildbuildbuild ◴[19 Nov 24 14:46 UTC] No.42183971[source]▶

>>42183907 #

Sometimes you have applications that should not be able to access an entire database. There are other various scaling reasons, and PG extensions that can be helpful. But I agree that for small to medium sized projects, SQLite is highly underrated.

30. Zambyte ◴[19 Nov 24 14:48 UTC] No.42183990[source]▶

>>42183907 #

When your application scales beyond one machine that needs access to the same database, PostgreSQL becomes an obviously better choice than SQLite. Until that point, SQLite is a fine, and honestly underrated choice.

DuckDB is another option worth considering.

replies(1): >>42184054 #

31. prisenco ◴[19 Nov 24 14:51 UTC] No.42184022[source]▶

>>42183907 #

The biggest one is redundancy. Architecting with Read replicas is much easier with Postgres than Sqlite because of it's server model.

Sqlite on the server is a fantastic starter database. Dead simple to set up, highly performant and scales way higher (vertically) than anyone gives it credit for.

But there certainly is a point you'll have to scale out instead of up, and while there are some great solutions for that (rqlite, litefs, dqlite, marmot) it's not inherent to Sqlite's design.

replies(2): >>42184171 #>>42184192 #

32. 2codr2pro2max ◴[19 Nov 24 14:51 UTC] No.42184026[source]▶

>>42183907 #

Jeebus i keep forgetting how many of you are bootcamp noobs here

replies(1): >>42184037 #

33. michelpp ◴[19 Nov 24 14:52 UTC] No.42184033[source]▶

>>42183817 #

Then you need one wrapper per database, with this approach you can have one database per row.

replies(1): >>42185400 #

34. prisenco ◴[19 Nov 24 14:52 UTC] No.42184037{3}[source]▶

>>42184026 #

Maybe so but this isn't helpful.

35. foul ◴[19 Nov 24 14:53 UTC] No.42184052[source]▶

>>42182146 (OP) #

A query on very ugly tables will count 26 thousands occurrences of the apostrophe, I love it

36. TekMol ◴[19 Nov 24 14:53 UTC] No.42184054{3}[source]▶

>>42183990 #

Should the concept of "machines" really be a concern of the DB layer?

SQLite already allows multiple connections, so putting it on a server and adding a program that talks a network protocol and proxies the queries to the DB sounds more logical to me?

replies(2): >>42184159 #>>42185224 #

37. evandrofisico ◴[19 Nov 24 15:04 UTC] No.42184159{4}[source]▶

>>42184054 #

And after all of that you basically have something that looks like postgres or mysql.

replies(1): >>42184207 #

38. TekMol ◴[19 Nov 24 15:05 UTC] No.42184171{3}[source]▶

>>42184022 #

Should replication really be a concern of the DB layer?

Replication means writing queries which alter the data to multiple machines, right?

Shouldn't that be done by a software one level up? Which takes in the queries via some network protocol and then sends them to all machines.

That would sound more logical to me.

replies(1): >>42184215 #

39. otoolep ◴[19 Nov 24 15:06 UTC] No.42184192{3}[source]▶

>>42184022 #

rqlite[1] creator here, happy to answer any questions about it.

[1] https://rqlite.io

40. TekMol ◴[19 Nov 24 15:07 UTC] No.42184207{5}[source]▶

>>42184159 #

My feeling is that I would have something better.

Because I can use SQLite and its "a file is all you need" approach as long as I don't need multiple machines.

And only bring in the other software (the proxy) when I need it.

41. prisenco ◴[19 Nov 24 15:08 UTC] No.42184215{4}[source]▶

>>42184171 #

Historically, yes. Databases were software that were concerned with both storage and networking.

It's fine to want to separate those out, but it's not easy to do so and there are reasons they've been coupled for decades.

replies(2): >>42184263 #>>42184297 #

42. jitl ◴[19 Nov 24 15:09 UTC] No.42184217{4}[source]▶

>>42183779 #

Notion has 100 million users, managing schema-per-tenant at our scale sounds like a complexity nightmare. We have 480+ identical schemas across 100+ Postgres hosts, and that already takes a lot of brainpower & engineering time to manage T_T

replies(1): >>42185066 #

43. redwood ◴[19 Nov 24 15:09 UTC] No.42184221[source]▶

>>42183907 #

Concurrency.

44. kunley ◴[19 Nov 24 15:12 UTC] No.42184254[source]▶

>>42182146 (OP) #

Speed, anyone?

How long does it take to update a table of, say, 1k rows? 1m rows? Same when subqueries and joins are involved to calculate what's to be updated?

replies(1): >>42184909 #

45. ◴[19 Nov 24 15:12 UTC] No.42184263{5}[source]▶

>>42184215 #

46. TekMol ◴[19 Nov 24 15:15 UTC] No.42184297{5}[source]▶

>>42184215 #

What makes it hard?

Having a single DB that takes write queries via a proxy which spreads them out to multiple read-only-DBs sounds easy at first.

replies(1): >>42184729 #

47. dataspun ◴[19 Nov 24 15:15 UTC] No.42184299[source]▶

>>42183907 #

MySQL has limited spatial data/function support versus PostGIS extension.

48. ◴[19 Nov 24 15:16 UTC] No.42184303[source]▶

>>42183335 #

49. amazingamazing ◴[19 Nov 24 15:20 UTC] No.42184370[source]▶

>>42182146 (OP) #

Yet again SQlite is over represented here in the server context.

50. ecuaflo ◴[19 Nov 24 15:20 UTC] No.42184372[source]▶

>>42182146 (OP) #

I think SQLite columns for SQLite would be superior to SQLite’s JSON columns whose operators are a whole ‘nother query language you need to learn and seem comparatively limited.

replies(4): >>42185181 #>>42185217 #>>42185761 #>>42185767 #

51. bitwize ◴[19 Nov 24 15:34 UTC] No.42184606[source]▶

>>42182146 (OP) #

Yo, dawg, I heard you like databases...

This is nuts. I can't think of a use for it, but I'm sure it's "a solution that will eventually find a problem".

replies(2): >>42186396 #>>42186603 #

52. rustman123 ◴[19 Nov 24 15:38 UTC] No.42184656[source]▶

>>42182146 (OP) #

This is likely great for serving SQLite data to frontends using the SQLite http-vfs.

Would be great combined with functions/triggers/views to mirror specific data/queries from Postgres as SQLite.

53. abtinf ◴[19 Nov 24 15:44 UTC] No.42184729{6}[source]▶

>>42184297 #

When do you consider the write/transaction to be completed?

What do you do about out-of-sync read replicas?

ACID gets real hard real fast when introducing replication.

replies(1): >>42184945 #

54. kevincox ◴[19 Nov 24 16:00 UTC] No.42184909[source]▶

>>42184254 #

The current implementation is writing out the DB to `/tmp` then reading the resulting file back and writing it to the column.

So on the bright side updating 1k rows takes the same amount of time as updating one row. On the other hand every write is a full table write (actually two).

I don't think there is a way to do this efficently with the current API as PostgreSQL is MVCC so it needs to write out each version separately (unless it has some sort of support of partial string sharing, I don't think so). Maybe a better version of this would write every page of the SQLite DB as a separate row so that you only need to update the changed pages.

replies(1): >>42186534 #

55. TekMol ◴[19 Nov 24 16:03 UTC] No.42184945{7}[source]▶

>>42184729 #

> When do you consider the write/transaction to be completed?

Sending a UPDATE/INSERT/DELETE statement to SQLite is not blocking? I would think it is, because in my code I can read the number of affected rows right after I sent the query.

> What do you do about out-of-sync read replicas?

Delete them and replace them by uploading a checkpoint and replaying a log of the queries since then.

replies(1): >>42186954 #

56. bob1029 ◴[19 Nov 24 16:14 UTC] No.42185066{5}[source]▶

>>42184217 #

> managing schema-per-tenant at our scale sounds like a complexity nightmare.

The per-tenant schema could be the tenant's responsibility. Most non-technical users can handle the idea of tables & columns, assuming you leverage UI/UX patterns they are already familiar with.

replies(1): >>42185386 #

57. tucnak ◴[19 Nov 24 16:20 UTC] No.42185147{4}[source]▶

>>42183609 #

> then they don't need to make joins between sqlite dbs.

The extension could also provide custom index access methods (considering that SQLite only has a handful of column types in the first place.) That would allow you to incorporate the keys in the index heaps, as opposed to table heaps, boom, you get bitmap index scans for Joins, i.e. GIN but with a bit more redundancy.

58. p4bl0 ◴[19 Nov 24 16:22 UTC] No.42185175[source]▶

>>42183560 #

“We need to go deeper” (https://i.kym-cdn.com/photos/images/newsfeed/000/384/176/d2f...)

59. arkh ◴[19 Nov 24 16:23 UTC] No.42185178[source]▶

>>42182146 (OP) #

You may want to use the $$ way to declare strings for your examples. Something like:

-- Create a todo for "frectonz"

UPDATE people

SET database = execute_sqlite(

    database,

    $sqlite$INSERT INTO todos VALUES ('solve multitenancy')$sqlite$

)

WHERE name = 'frectonz';

replies(1): >>42185191 #

60. maCDzP ◴[19 Nov 24 16:23 UTC] No.42185181[source]▶

>>42184372 #

I spent some time learning those queries for a project and when I grokked it they where very handy.

61. frectonz ◴[19 Nov 24 16:24 UTC] No.42185191[source]▶

>>42185178 #

oh nice, i didn't know this existed, thanks

replies(1): >>42185889 #

62. ray_v ◴[19 Nov 24 16:25 UTC] No.42185217[source]▶

>>42184372 #

wouldn't that just be a foreign key to another table or, a list of keys or am i missing something?

replies(1): >>42198035 #

63. Zambyte ◴[19 Nov 24 16:26 UTC] No.42185224{4}[source]▶

>>42184054 #

High performance software is written acknowledging the reality that it will run on hardware. Databases tend to be a class of software that is hyper-focused on performance.

Writing a networked application that uses SQLite as a database is perfectly reasonable. You're just making the decision to lift the layer of abstraction that is concerned with machines from the DB to your application, which may or may not be a reasonable thing to do.

64. jitl ◴[19 Nov 24 16:40 UTC] No.42185386{6}[source]▶

>>42185066 #

Our UI looks like a table: https://www.notion.so/help/intro-to-databases

As long as we never add new features, never need to change how we map UI <-> Postgres DDL, and our users never make any mistakes when they change their tables, it could work without being a complexity nightmare

replies(1): >>42186593 #

65. ellisv ◴[19 Nov 24 16:41 UTC] No.42185400{3}[source]▶

>>42184033 #

But can I have one row that holds all the databases?

replies(1): >>42185716 #

66. BiteCode_dev ◴[19 Nov 24 16:53 UTC] No.42185516[source]▶

>>42182146 (OP) #

Someone, somewhere will eventually find a legitimate use case for it.

replies(1): >>42186135 #

67. frectonz ◴[19 Nov 24 17:09 UTC] No.42185716{4}[source]▶

>>42185400 #

yes you can

CREATE TABLE crime_against_humanity ( databases SQLITE[] );

replies(1): >>42186695 #

68. eddieroger ◴[19 Nov 24 17:12 UTC] No.42185759[source]▶

>>42183327 #

Dang wish this wasn't the bottom comment because it is amazing. If it wasn't here, I'd have posted it.

69. james_marks ◴[19 Nov 24 17:12 UTC] No.42185761[source]▶

>>42184372 #

Agreed, the JSON search queries in Postgres are esoteric, to say the least.

But after spending some time with a mixed-schema table at even modest scale, I’m wondering how often a better design could have cut the whole problem off.

70. xyc ◴[19 Nov 24 17:13 UTC] No.42185767[source]▶

>>42184372 #

With Claude you barely had to learn the language this days as you just need to prompt, but SQLite column is an interesting idea.

71. mixmastamyk ◴[19 Nov 24 17:16 UTC] No.42185818[source]▶

>>42183246 #

I’m thinking maybe you’d like to use litefs for multi-tenant dbs close to the tenant. But perhaps you’ll want a centralized billing/reports database under postgres as well?

So, instead of saving the client sqlite db of the org to cloud storage you save it to the centralized db column instead. Litefs probably doesn’t support it yet, but wouldn’t be too hard to add.

replies(1): >>42198948 #

72. Tostino ◴[19 Nov 24 17:21 UTC] No.42185889{3}[source]▶

>>42185191 #

The string between the dollar signs can only be closed by another set of dollar quotes with the same string between them. So it allows you to do quotes within quotes within quotes if necessary.

73. ◴[19 Nov 24 17:28 UTC] No.42185971[source]▶

>>42182146 (OP) #

74. Tostino ◴[19 Nov 24 17:31 UTC] No.42186004{4}[source]▶

>>42183581 #

You can do that with any function already. This isn't new because of nested databases.

75. Tostino ◴[19 Nov 24 17:34 UTC] No.42186047{3}[source]▶

>>42183675 #

I love using the database as the source of truth for data consistency, and constraining your data to only be allowed in your database as long as it's in a valid state.

It's easy enough to replicate those constraints to the client if you want the client to do ahead of time validation, but your source of truth lives in the database...

I wouldn't survive with SQLite.

replies(1): >>42187284 #

76. michelpp ◴[19 Nov 24 17:36 UTC] No.42186062{4}[source]▶

>>42183615 #

Here's an example of a simple expanded object to start from:

https://github.com/michelp/pgexpanded

77. NoMoreNicksLeft ◴[19 Nov 24 17:42 UTC] No.42186135[source]▶

>>42185516 #

We should have a response team standing by, ready to dump thousands of tons of concrete onto that legitimate use case. A gigantic cement sarcophagus that may not solve the problem, but our descendants thousands of years from now may be better prepared to do what we can't and destroy it. The "someone" will just have to be a tragic casualty, as we won't be able to save him or her without risking the contagion spreading.

78. vivzkestrel ◴[19 Nov 24 17:43 UTC] No.42186156[source]▶

>>42183560 #

next stop: mongodb inside sqlite inside postgresql

79. dboreham ◴[19 Nov 24 17:55 UTC] No.42186297[source]▶

>>42182146 (OP) #

This is not as crazy as it sounds but I'd rather have PG-in-PG than the hetrogeneous arrangement here. PG in S3 would be useful too.

80. Pinus ◴[19 Nov 24 18:04 UTC] No.42186385[source]▶

>>42182146 (OP) #

What, no operators? I want indexes on these columns, and some weird and wonderful operator syntax for doing cross-database joins between multiple DATABASE columns! =)

81. aaronbwebber ◴[19 Nov 24 18:05 UTC] No.42186396[source]▶

>>42184606 #

I was _extremely disappointed_ not to see this meme when I clicked on the link. Will not consider using this extension until Xzibit is prominently featured.

82. tacone ◴[19 Nov 24 18:06 UTC] No.42186417[source]▶

>>42182146 (OP) #

Please let us embed that sqlite-hosting postgres in a TXT DNS record. We really need that.

83. traeregan ◴[19 Nov 24 18:08 UTC] No.42186434[source]▶

>>42183560 #

This is some kind of RDBMS mutant CRUDucken.

replies(1): >>42188867 #

84. Suppafly ◴[19 Nov 24 18:18 UTC] No.42186534{3}[source]▶

>>42184909 #

>The current implementation is writing out the DB to `/tmp` then reading the resulting file back and writing it to the column.

I think there was already another comment where someone told OP how to solve that.

85. mbesto ◴[19 Nov 24 18:24 UTC] No.42186593{7}[source]▶

>>42185386 #

Curious - so how do you manage client-specific schemas then? Do you just have mappings in postgres (column1, column2, column3, etc.) or maybe store a client specific schema in bson per client?

replies(1): >>42189493 #

86. unregistereddev ◴[19 Nov 24 18:25 UTC] No.42186603[source]▶

>>42184606 #

A different approach:

I had a project that stored a tremendous amount of spatial data. There were "sessions" of spatially-tagged time-series data that would be individually processed (think generating a map layer from time-series data). There were also reasons to perform higher level aggregations that did not dive into the time series data. The data density was high enough that it was impractical to build spatial indices over the entire dataset. Even using space-filling curves as multidimensional B-trees would require so many lookups that queries were impractically slow.

One POC I tried (and then rejected as an abomination) was to store each session's time-series data inside a SQLite database with SpatialLite extensions enabled. Then store each session's metadata, including spatial extent, in a Postgres database. The SQLite files were tossed in S3 and referenced from Postgres. I guess I could have inserted them directly to a BLOB column inside Postgres.

87. klysm ◴[19 Nov 24 18:30 UTC] No.42186653[source]▶

>>42182146 (OP) #

If you’re using Postgres, multi tenancy has been solved with row level security. It’s super easy to add a tenant id column to every table and a policy that only allows connections to see data from one tenant

replies(2): >>42186822 #>>42187409 #

88. michelpp ◴[19 Nov 24 18:36 UTC] No.42186695{5}[source]▶

>>42185716 #

With the expanded datum api you can also work with subscriptable array types to only expand elements lazily as needed. It might already works if you try it, but support for it might be hardwired only to nested stock arrays, something to look into.

89. eastbound ◴[19 Nov 24 18:37 UTC] No.42186712[source]▶

>>42183560 #

Giving @Transactional(NESTED) a whole new meaning.

90. klysm ◴[19 Nov 24 18:37 UTC] No.42186715[source]▶

>>42182146 (OP) #

The real win of this seems like schema divergence among one column?

91. michelpp ◴[19 Nov 24 18:49 UTC] No.42186822[source]▶

>>42186653 #

RLS is very useful and can solve multi tenancy and other problems, but it is complicated and can add a significant per row cost to queries if your policies get complicated.

The common path of comparing some constant like the role name to some column in the table is fine, and it's fast enough as the policy checker already has the row in hand when it does the check, but the natural tendency for people to want to abstract their policies into a function like has_permission() will blow up fast.

The best approach I've seen from pyramation's launchql [1] which precomputes policies into a bitstring and then masks that against a query constant bitstring of required permissions. Flexible policy definitions compiled into the row as bits so the check is as fast as possible.

[1] https://github.com/launchql/launchql

replies(1): >>42189398 #

92. mediaman ◴[19 Nov 24 18:53 UTC] No.42186863{3}[source]▶

>>42183361 #

Why not use jsonb for this kind of thing? Store the schema in one table, one per client, or perhaps one per table per client, and then store the data for that in another table, segregated by customer and table type, with row data stored in a JSONB field using that table's schema.

I normally don't like using JSONB when I could have a rigorous schema, but this sort of application seems reasonable.

93. Tostino ◴[19 Nov 24 19:04 UTC] No.42186954{8}[source]▶

>>42184945 #

If you are doing statement level replication, you better make sure every query is run in the exact same order (and finishes in the same order).

Without that you will have drift from your master database.

With that, you have a whole new host of synchronization issues you need to deal with.

94. sgarland ◴[19 Nov 24 19:40 UTC] No.42187284{4}[source]▶

>>42186047 #

You can make it behave with its STRICT mode, but that’s fairly recent, and it’s also just upsetting that it has to exist in the first place.

Completely agree that the DB should be the arbiter of validity. Constraints are a good thing.

95. toasterlovin ◴[19 Nov 24 19:54 UTC] No.42187409[source]▶

>>42186653 #

Multi-tenancy causes performance issues that simply don't exist if each customer's data is in it's own database.

replies(1): >>42207680 #

96. DeathArrow ◴[19 Nov 24 20:07 UTC] No.42187550[source]▶

>>42182146 (OP) #

How would embedding a database inside another help someone?

97. nojvek ◴[19 Nov 24 20:11 UTC] No.42187585[source]▶

>>42182146 (OP) #

Yo dawg, I heard you liked databases. So I put a database inside your database!

Exactly what we need from a Show HN.

98. SahAssar ◴[19 Nov 24 21:54 UTC] No.42188502[source]▶

>>42184000 #

Nobody wants AI-generated comment summaries, even less so without disclosing it's generated.

99. luismedel ◴[19 Nov 24 22:31 UTC] No.42188798{3}[source]▶

>>42183707 #

Now you only need support for qcow columns which you can mount in your embedded engines....et volia, enjoy your storage and compute separation.

100. sgarland ◴[19 Nov 24 22:43 UTC] No.42188867{3}[source]▶

>>42186434 #

‘Tis the season.

101. klysm ◴[19 Nov 24 23:59 UTC] No.42189398{3}[source]▶

>>42186822 #

Sure if you start using it for more than just multitenancy you can get into performance trouble or other complexities. I haven’t felt tempted to put anything beyond the tenant level isolation though yet and it’s served us very well

102. jitl ◴[20 Nov 24 00:14 UTC] No.42189493{8}[source]▶

>>42186593 #

It's all JSON in two Postgres tables: `collection` which represents a Notion Database and has a `collection.schema` JSONB column, and `block` which has a `block.properties` JSONB column that stores the property values - the stuff in the Notion Database cells - for each row. We apply "schema on read" when querying or rendering a Notion Database, and we have a service on the backend that builds indexes/caches for hot collections on the fly. The service handles all the queries for collections larger than X rows. For smaller collections, we just give the client the whole thing modulo permissions and it does the query locally.

replies(1): >>42217123 #

103. counterpartyrsk ◴[20 Nov 24 06:19 UTC] No.42191252[source]▶

>>42182146 (OP) #

Posts like this make me realize I never even scratched the surface of any technology ever.

104. ffsm8 ◴[20 Nov 24 07:13 UTC] No.42191457[source]▶

>>42183625 #

Your link already points out that this isn't followed anymore since Json has been added as a default SQL feature

replies(1): >>42194222 #

105. snthpy ◴[20 Nov 24 08:04 UTC] No.42191714[source]▶

>>42182146 (OP) #

If you do the same for DuckDB you could call it pgducken. ;-)

106. abrookewood ◴[20 Nov 24 11:30 UTC] No.42192888[source]▶

>>42182146 (OP) #

"multitenancy has been solved" - I'm confused, can you not use multiple SQLite files??

107. sgarland ◴[20 Nov 24 14:34 UTC] No.42194222{3}[source]▶

>>42191457 #

No? It says that SQL99 allows non-atomic types, and SQL16 allows JSON. That doesn’t mean that 1NF is dead, or even that JSON is allowed in 1NF, only that the standard (which RDBMS providers may choose to implement in part or whole) allows for their existence.

Atomicity of values has been debated for a long time. I’ve come around to the idea that flat arrays can be included in a 1NF table, because they don’t imply any additional structure to the schema. The problem with JSON is that it supports arbitrary K:V pairs as well as nesting, and so can introduce a schema within a schema, which is prone to referential integrity violations (not to mention generally poor performance in RDBMS).

Embedding an entire DB is of course beyond the pale, and my comment was an attempt at wit.

replies(1): >>42198936 #

108. ecuaflo ◴[20 Nov 24 20:58 UTC] No.42198035{3}[source]▶

>>42185217 #

I think it’s useful when you only need to query one way and can then avoid a join each time/extra schema complexity of a join table.

109. lakomen ◴[20 Nov 24 22:39 UTC] No.42198877[source]▶

>>42182146 (OP) #

Why?

110. ffsm8 ◴[20 Nov 24 22:48 UTC] No.42198936{4}[source]▶

>>42194222 #

Allowing Json means that you're able to store any amount of data, this means you can effectively store full tables (plural!) in a single cell

I.e. { entities: {1: { id:1, name: "abc"}, 2: ... }

111. cityzen ◴[20 Nov 24 22:49 UTC] No.42198948{3}[source]▶

>>42185818 #

Mixmaster Mike, what’cha got say??

112. klysm ◴[21 Nov 24 19:24 UTC] No.42207680{3}[source]▶

>>42187409 #

Yeah really depends on the volume of the data and how sensitive the workload is to a few milliseconds. For a lot of business use cases, it's totally worth it to maintain just one database.

113. mbesto ◴[22 Nov 24 20:25 UTC] No.42217123{9}[source]▶

>>42189493 #

Thanks! Super helpful.

114. id02009 ◴[30 Nov 24 12:58 UTC] No.42281349[source]▶

>>42182146 (OP) #

Noob question: isn't it kind of defeating the main purpose of SQLite? Meaning having db running in the same process as the program?

↑