←back to thread

121 points b-man | 2 comments | | HN request time: 0s | source
Show context
adamcharnock ◴[] No.44026525[source]
> A relation should be identified by a natural key that reflects the entity’s essential, domain-defined identity — not by arbitrary or surrogate values.

I fairly strongly disagree with this. Database identifiers have to serve a lot of purposes, and natural key almost certainly isn’t ideal. Off the top my head, IDs can be used for:

- Joins, lookups, indexes. Here data type can matter regarding performance and resource use.

- Idempotency. Allowing a client to generate IDs can be a big help here (ie UUIDs)

- Sharing. You may want to share a URL to something that requires the key, but not expose domain data (a URL to a user’s profile image shouldn’t expose their national ID).

There is not one solution that handles all of these well. But using natural keys is one of the least good options.

Also, we all know that stakeholders will absolutely swear that there will never be two people with the same national ID. Oh, except unless someone died, then we may reuse their ID. Oh, and sometimes this remote territory has duplicate IDs with the mainland. Oh, and for people born during that revolution 50 years ago, we just kinda had to make stuff up for them.

So ideally I’d put a unique index on the national ID column. But realistically, it would be no unique constraint and instead form validation + a warning on anytime someone opened a screen for a user with a non-unique ID.

Then maybe a BIGINT for database ID, and a UUID4/7 for exposing to the world.

EDIT: Actually, the article is proposing a new principle. And so perhaps this could indeed be a viable one. And my comment above would describe situations where it is valid to break the principle. But I also suspect that this is so rarely a good idea that it shouldn’t be the default choice.

replies(12): >>44026711 #>>44026778 #>>44026822 #>>44026881 #>>44027090 #>>44027969 #>>44028025 #>>44029492 #>>44030454 #>>44030564 #>>44030578 #>>44031152 #
Jarwain ◴[] No.44026711[source]
Why have both a database ID and UUIDv7, versus just a UUIDv7?
replies(2): >>44026865 #>>44026928 #
jandrewrogers ◴[] No.44026928[source]
Actually, it should be a database ID and an encrypted database ID, which doesn’t require storing a second ID. Even better, you can make that key unique per session so that users can’t share keys. For security reasons, it is a bad idea to leak private state, which UUIDv7 does.

A single AES encryption block is the same size as a UUID and cheap to compute.

replies(4): >>44027083 #>>44027244 #>>44027288 #>>44064636 #
adamcharnock ◴[] No.44027083[source]
> A single AES encryption block is the same size as a UUID and cheap to compute.

I didn’t realise this! The UUID spec mandates some values for specific digits, so I assume this would not be strictly valid UUIDs?

replies(2): >>44027207 #>>44065237 #
1. jandrewrogers ◴[] No.44027207[source]
They would not be valid UUIDs, it is an opaque 128-bit value.

To be honest many companies are not using strictly standard conforming UUIDs everywhere, and UUID has become a synonym for opaque 128-bit identifier. Consumers of the value just need them to be unique. All standard UUID versions are sometimes prohibited in environments, except maybe UUIDv8, so this semantic ambiguity is helpful.

Technically, you could make it present as a standard UUIDv4 or UUIDv8 by setting a few bits, as long as you remember to add them back if you ever need to decrypt it. The entropy might be a bit off for a UUIDv4 if someone actually checks but you can guarantee the uniqueness.

Using AES to produce a UUID-like key is an old but useful trick. Both ARM and x86 do that encryption in hardware — it is cheaper to generate than the standardized UUID versions in most cases.

replies(1): >>44065257 #
2. sysysy ◴[] No.44065257[source]
Which mode of AES do you mean? AES ECB seems to fit your example, but wouldn't it still expose the incremental nature UUID v7 with enough samples (see https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation, the Linux Tux image example)? Other AES modes produce way longer results, even when the ID is encoded with Base58 or the like.