←back to thread

779 points rexpository | 2 comments | | HN request time: 0.441s | source
Show context
sgarland ◴[] No.44505539[source]
I’m more upset at how people are so fucking dense about normalization, honestly. If you use LLMs to build your app, you get what you deserve. But to proudly display your ignorance on the beating heart of every app?

You have a CHECK constraint on support_messages.sender_role (let’s not get into how table names should be singular because every row is a set) - why not just make it an ENUM, or a lookup table? Either way, you’re saving space, and negating the need for that constraint.

Or the rampant use of UUIDs for no good reason – pray tell, why does integration_tokens need a UUID PK? For that matter, why isn’t the provider column a lookup table reference?

There is an incredible amount of compute waste in the world from RDBMS misuse, and it’s horrifying.

replies(2): >>44505745 #>>44508384 #
1. nijave ◴[] No.44508384[source]
UUIDs are nice in cell architectures when you have multiple identical deployment of the app with different partitions of data. They prevent ID conflicts across cells/tenants/instances if you need to move data around.

Remapping primary keys for hundreds of relations because you want to move a customer from region A DB to region B DB is an absolute nightmare

replies(1): >>44509584 #
2. sgarland ◴[] No.44509584[source]
Sure, or you make a natural key. Depending on your RDBMS (MySQL and SQL Server cluster rows around the PK) and query patterns, this may be quite a bit faster, to boot. For example, if you have a table with customer orders, you could have a PK like (user_id, order_id), where both are INTEGER (this assumes that you have a centralized service assigning user ids, which isn’t that big of an ask IMO). Even if you used BIGINT for both - which is almost certainly not required for 99% of businesses for this example - that’s still 16 bytes, or the same as a binary-encoded UUID. Since most queries for this kind of thing will involve a user id, with a clustering index like MySQL / InnoDB, all of the user’s records will be physically co-located. Even for Postgres, which stores tuples in a heap, you’re not going to take a performance hit, and it’s a lot easier to read two ints than two UUIDs.

The problem is these performance boosts / hits don’t make themselves painfully obvious until you’re at the hundreds of millions of rows scale, at which point if you didn’t do it properly, fixing it is much more difficult.