The principles of database design, or, the Truth is out there

> A relation should be identified by a natural key that reflects the entity’s essential, domain-defined identity — not by arbitrary or surrogate values.

I fairly strongly disagree with this. Database identifiers have to serve a lot of purposes, and natural key almost certainly isn’t ideal. Off the top my head, IDs can be used for:

- Joins, lookups, indexes. Here data type can matter regarding performance and resource use.

- Idempotency. Allowing a client to generate IDs can be a big help here (ie UUIDs)

- Sharing. You may want to share a URL to something that requires the key, but not expose domain data (a URL to a user’s profile image shouldn’t expose their national ID).

There is not one solution that handles all of these well. But using natural keys is one of the least good options.

Also, we all know that stakeholders will absolutely swear that there will never be two people with the same national ID. Oh, except unless someone died, then we may reuse their ID. Oh, and sometimes this remote territory has duplicate IDs with the mainland. Oh, and for people born during that revolution 50 years ago, we just kinda had to make stuff up for them.

So ideally I’d put a unique index on the national ID column. But realistically, it would be no unique constraint and instead form validation + a warning on anytime someone opened a screen for a user with a non-unique ID.

Then maybe a BIGINT for database ID, and a UUID4/7 for exposing to the world.

EDIT: Actually, the article is proposing a new principle. And so perhaps this could indeed be a viable one. And my comment above would describe situations where it is valid to break the principle. But I also suspect that this is so rarely a good idea that it shouldn’t be the default choice.

> Joins, lookups, indexes.

You both want to control these values within your database engine, at least so that they are actually unique within the domain, and there is no real reason for it to be user-controlled anyway, as they are used referentially.

> Idempotency.

User supplied tokens for idempotency are mostly useful within the broader context of application sitting on top of database, otherwise they become subject to the same requirements internally generated ones are, without control, which is a recipe for disaster.

> Sharing

Those are the same idempotency tokens from previous points with you as the supplier. In some cases you want to share them across prod/stage/dev environments, in some cases you may want to explicitly avoid duplicates, in some cases you don't care.

All these use cases are solved with mapping tables / classifiers.

Example: in an asset management system you need to expose and identifier of a user-registered computer, that is built using components procured from different suppliers, with their own identification schemas, e.g. exposing keyboard/mouse combo as one component with two subcomponents or two (possibly linked) components.

This requires you to use all those listed identifier types in different parts of the system. You can bake those in database/schema design, or employ some normalization and use "native" identifier and mapping tables.