←back to thread

121 points b-man | 1 comments | | HN request time: 0.221s | source
Show context
adamcharnock ◴[] No.44026525[source]
> A relation should be identified by a natural key that reflects the entity’s essential, domain-defined identity — not by arbitrary or surrogate values.

I fairly strongly disagree with this. Database identifiers have to serve a lot of purposes, and natural key almost certainly isn’t ideal. Off the top my head, IDs can be used for:

- Joins, lookups, indexes. Here data type can matter regarding performance and resource use.

- Idempotency. Allowing a client to generate IDs can be a big help here (ie UUIDs)

- Sharing. You may want to share a URL to something that requires the key, but not expose domain data (a URL to a user’s profile image shouldn’t expose their national ID).

There is not one solution that handles all of these well. But using natural keys is one of the least good options.

Also, we all know that stakeholders will absolutely swear that there will never be two people with the same national ID. Oh, except unless someone died, then we may reuse their ID. Oh, and sometimes this remote territory has duplicate IDs with the mainland. Oh, and for people born during that revolution 50 years ago, we just kinda had to make stuff up for them.

So ideally I’d put a unique index on the national ID column. But realistically, it would be no unique constraint and instead form validation + a warning on anytime someone opened a screen for a user with a non-unique ID.

Then maybe a BIGINT for database ID, and a UUID4/7 for exposing to the world.

EDIT: Actually, the article is proposing a new principle. And so perhaps this could indeed be a viable one. And my comment above would describe situations where it is valid to break the principle. But I also suspect that this is so rarely a good idea that it shouldn’t be the default choice.

replies(12): >>44026711 #>>44026778 #>>44026822 #>>44026881 #>>44027090 #>>44027969 #>>44028025 #>>44029492 #>>44030454 #>>44030564 #>>44030578 #>>44031152 #
sitharus ◴[] No.44027090[source]
This isn’t a new principle, it was part of database design courses in the early 2000s at least. However from a couple of decades of bitter experience I say external keys should never be your primary keys. They’ll always change for some reason.

Yes you can create your tables with ON UPDATE CASCADE foreign keys, but are those really the only places the ID is used?

Sometimes your own data does present a natural key though, so if it’s fully within your control then it’s a decent idea to use.

replies(4): >>44027710 #>>44030424 #>>44030597 #>>44036404 #
1. bunderbunder ◴[] No.44030597[source]
Even internal keys.

For example, suppose you have an information management system where the user can define their own logical fields or attributes. Naturally those names should uniquely identify a field. That makes them an easy candidate for a natural key. But I would still use a surrogate key.

I've worked in two systems that had this feature. In one the field name was a primary key, and in the other they used a surrogate key and a separate uniqueness constraint. In both a requirement to let users rename fields was added later. In the one that used a surrogate key, this was an easy and straightforward change.

In the one that used field name as a natural key, the team decided that implementing the feature wasn't feasible. The name change operation, with its cascade of values needing to be updated for all the foreign key relationships, would have been way too big a transaction to be doing on the fly, in a live database, during high usage times of day. So instead they added a separate "display name" field and updated all the queries to coalesce that with the "official" name. Which is just ugly, and also horribly confusing. Especially in oddball cases where users did something like swapping the names of two fields.