←back to thread

793 points rexpository | 1 comments | | HN request time: 0.193s | source
Show context
sgarland ◴[] No.44505539[source]
I’m more upset at how people are so fucking dense about normalization, honestly. If you use LLMs to build your app, you get what you deserve. But to proudly display your ignorance on the beating heart of every app?

You have a CHECK constraint on support_messages.sender_role (let’s not get into how table names should be singular because every row is a set) - why not just make it an ENUM, or a lookup table? Either way, you’re saving space, and negating the need for that constraint.

Or the rampant use of UUIDs for no good reason – pray tell, why does integration_tokens need a UUID PK? For that matter, why isn’t the provider column a lookup table reference?

There is an incredible amount of compute waste in the world from RDBMS misuse, and it’s horrifying.

replies(2): >>44505745 #>>44508384 #
apt-apt-apt-apt ◴[] No.44505745[source]
When deciding recently whether to use CHECK ('a', 'b', 'c') vs ENUM, I believe a search/LLM-query stated that it was easier to change a CHECK's values later and not easy for ENUM, so that's what I went with.

As for a lookup table, truly curious, is it worth the complexity of the foreign reference and join?

replies(2): >>44507568 #>>44509724 #
1. sgarland ◴[] No.44509724[source]
Please read source docs instead of relying on LLMs, especially for RDBMS. I’ve found they quite often get something subtly wrong; for example, recommending that the PK be added to a secondary composite index in MySQL - this is entirely unnecessary, because all secondary indices in MySQL implicitly include the PK.

> lookup table worth it

Is not doing it worth the risk of referential integrity violations? How important is your data to you? You can say, “oh, the app will handle that” all you want, but humans are not perfect, but RDBMS is as close as you’re ever going to come to it. I have seen orphaned rows and referential violations at every company I’ve been at that didn’t enforce foreign key constraints.

There is a performance hit at scale to not doing it, also: imagine you have a status column with some ENUM-esque values, like CANCELED, APPROVED, etc. If stored as TEXT or VARCHAR, that’s N+(1-2 bytes) per string. At the hundreds of millions or billions of rows scale, this adds up. Storage is cheap, but memory isn’t, and if you’re wasting it on repeated text strings, that’s a lot fewer rows per page you can fit, and so more disk access is required. JSON objects are the same, since both MySQL and Postgres only shift large blob-type objects off-page after a certain threshold.