The principles of database design, or, the Truth is out there

(ebellani.github.io)

121 points b-man | 2 comments | 19 May 25 02:58 UTC | HN request time: 0s | source

Show context

adamcharnock ◴[19 May 25 04:34 UTC] No.44026525[source]▶

> A relation should be identified by a natural key that reflects the entity’s essential, domain-defined identity — not by arbitrary or surrogate values.

I fairly strongly disagree with this. Database identifiers have to serve a lot of purposes, and natural key almost certainly isn’t ideal. Off the top my head, IDs can be used for:

- Joins, lookups, indexes. Here data type can matter regarding performance and resource use.

- Idempotency. Allowing a client to generate IDs can be a big help here (ie UUIDs)

- Sharing. You may want to share a URL to something that requires the key, but not expose domain data (a URL to a user’s profile image shouldn’t expose their national ID).

There is not one solution that handles all of these well. But using natural keys is one of the least good options.

Also, we all know that stakeholders will absolutely swear that there will never be two people with the same national ID. Oh, except unless someone died, then we may reuse their ID. Oh, and sometimes this remote territory has duplicate IDs with the mainland. Oh, and for people born during that revolution 50 years ago, we just kinda had to make stuff up for them.

So ideally I’d put a unique index on the national ID column. But realistically, it would be no unique constraint and instead form validation + a warning on anytime someone opened a screen for a user with a non-unique ID.

Then maybe a BIGINT for database ID, and a UUID4/7 for exposing to the world.

EDIT: Actually, the article is proposing a new principle. And so perhaps this could indeed be a viable one. And my comment above would describe situations where it is valid to break the principle. But I also suspect that this is so rarely a good idea that it shouldn’t be the default choice.

replies(12): >>44026711 #>>44026778 #>>44026822 #>>44026881 #>>44027090 #>>44027969 #>>44028025 #>>44029492 #>>44030454 #>>44030564 #>>44030578 #>>44031152 #

Jarwain ◴[19 May 25 05:28 UTC] No.44026711[source]▶

>>44026525 #

Why have both a database ID and UUIDv7, versus just a UUIDv7?

replies(2): >>44026865 #>>44026928 #

jandrewrogers ◴[19 May 25 06:16 UTC] No.44026928[source]▶

>>44026711 #

Actually, it should be a database ID and an encrypted database ID, which doesn’t require storing a second ID. Even better, you can make that key unique per session so that users can’t share keys. For security reasons, it is a bad idea to leak private state, which UUIDv7 does.

A single AES encryption block is the same size as a UUID and cheap to compute.

replies(4): >>44027083 #>>44027244 #>>44027288 #>>44064636 #

1. lukevp ◴[19 May 25 07:27 UTC] No.44027288[source]▶

>>44026928 #

Can you explain this a bit more or link to something? I don’t really understand. What’s encrypted? A guid? A monotonic integer ID? Is the encrypted ID only used for user facing stuff? How is it decrypted? What do you gain by this?

replies(1): >>44031313 #

2. jandrewrogers ◴[19 May 25 16:06 UTC] No.44031313[source]▶

>>44027288 (TP) #

I don’t have a link. I’ve never seen a good writeup but the practice is really old.

It is literally encrypting whatever type you are using as a handle for records that you send the user, typically an integer, a UUID, or some slightly more complex composite key. The only restriction is that the key should not be longer than 128-bits so that you can encrypt it to a UUID-like value; in practice you can almost always guarantee this [0]. The encryption only happens at the trust boundary i.e. when sending it to a random user. That encrypted key is not stored or used at all internally, it just changes the value of the key the user sees.

Most obvious ways of generating handles in a database leak private state. This is routinely exploited almost to the point of being a meme. Encrypting the handles you show the user prevents that.

An advantage of this is that if you are systematically encrypting exported keys then you can attach sensitive metadata to those keys at runtime if you wish, which can be very convenient. You have 128 bits to work with and a unique serial only needs 64 at most. If you are paranoid, there are schemes that enable some degree of key authentication. And while well beyond the scope here, similar constructions work nicely for compact keys in federate data models.

At scale, the cost (storage, compute, etc) of all of this matters. Encryption of keys, if done intelligently, is atypically efficient on all accounts while providing explicit inspectable security guarantees.

[0] There are reasons you might want to expand the exported key to a 256-bit value, particularly in high-assurance type environments, but the advantage of 128-bits is that it is literally drop-in compatible with UUIDs almost everywhere.

↑