←back to thread

780 points rexpository | 6 comments | | HN request time: 0.784s | source | bottom
1. sgarland ◴[] No.44505539[source]
I’m more upset at how people are so fucking dense about normalization, honestly. If you use LLMs to build your app, you get what you deserve. But to proudly display your ignorance on the beating heart of every app?

You have a CHECK constraint on support_messages.sender_role (let’s not get into how table names should be singular because every row is a set) - why not just make it an ENUM, or a lookup table? Either way, you’re saving space, and negating the need for that constraint.

Or the rampant use of UUIDs for no good reason – pray tell, why does integration_tokens need a UUID PK? For that matter, why isn’t the provider column a lookup table reference?

There is an incredible amount of compute waste in the world from RDBMS misuse, and it’s horrifying.

replies(2): >>44505745 #>>44508384 #
2. apt-apt-apt-apt ◴[] No.44505745[source]
When deciding recently whether to use CHECK ('a', 'b', 'c') vs ENUM, I believe a search/LLM-query stated that it was easier to change a CHECK's values later and not easy for ENUM, so that's what I went with.

As for a lookup table, truly curious, is it worth the complexity of the foreign reference and join?

replies(2): >>44507568 #>>44509724 #
3. benmmurphy ◴[] No.44507568[source]
for postgres ENUM should be just as easier to change as CHECK

* for adding, there is no problem you can just add entries to an ENUM

* for removing there is a problem because you can't easily remove entries from an ENUM. its only possible to create a new enum type and then change the column type but that is going to cause problems with big tables. however, now your ENUM solution decays to a CHECK+ENUM solution. so it is not really any worse than a CHECK solution.

also, it is possible to add new CHECK constraints to a big table by marking the constraint as 'NOT VALID'. existing rows will not be checked and only new rows will be checked.

4. nijave ◴[] No.44508384[source]
UUIDs are nice in cell architectures when you have multiple identical deployment of the app with different partitions of data. They prevent ID conflicts across cells/tenants/instances if you need to move data around.

Remapping primary keys for hundreds of relations because you want to move a customer from region A DB to region B DB is an absolute nightmare

replies(1): >>44509584 #
5. sgarland ◴[] No.44509584[source]
Sure, or you make a natural key. Depending on your RDBMS (MySQL and SQL Server cluster rows around the PK) and query patterns, this may be quite a bit faster, to boot. For example, if you have a table with customer orders, you could have a PK like (user_id, order_id), where both are INTEGER (this assumes that you have a centralized service assigning user ids, which isn’t that big of an ask IMO). Even if you used BIGINT for both - which is almost certainly not required for 99% of businesses for this example - that’s still 16 bytes, or the same as a binary-encoded UUID. Since most queries for this kind of thing will involve a user id, with a clustering index like MySQL / InnoDB, all of the user’s records will be physically co-located. Even for Postgres, which stores tuples in a heap, you’re not going to take a performance hit, and it’s a lot easier to read two ints than two UUIDs.

The problem is these performance boosts / hits don’t make themselves painfully obvious until you’re at the hundreds of millions of rows scale, at which point if you didn’t do it properly, fixing it is much more difficult.

6. sgarland ◴[] No.44509724[source]
Please read source docs instead of relying on LLMs, especially for RDBMS. I’ve found they quite often get something subtly wrong; for example, recommending that the PK be added to a secondary composite index in MySQL - this is entirely unnecessary, because all secondary indices in MySQL implicitly include the PK.

> lookup table worth it

Is not doing it worth the risk of referential integrity violations? How important is your data to you? You can say, “oh, the app will handle that” all you want, but humans are not perfect, but RDBMS is as close as you’re ever going to come to it. I have seen orphaned rows and referential violations at every company I’ve been at that didn’t enforce foreign key constraints.

There is a performance hit at scale to not doing it, also: imagine you have a status column with some ENUM-esque values, like CANCELED, APPROVED, etc. If stored as TEXT or VARCHAR, that’s N+(1-2 bytes) per string. At the hundreds of millions or billions of rows scale, this adds up. Storage is cheap, but memory isn’t, and if you’re wasting it on repeated text strings, that’s a lot fewer rows per page you can fit, and so more disk access is required. JSON objects are the same, since both MySQL and Postgres only shift large blob-type objects off-page after a certain threshold.