←back to thread

121 points b-man | 3 comments | | HN request time: 0s | source
Show context
mrkeen ◴[] No.44026549[source]
> Principle of Essential Denotation (PED): A relation should be identified by a natural key that reflects the entity’s essential, domain-defined identity — not by arbitrary or surrogate values.

  create table citizen (
    national_id national_id primary key,
    full_name text);
Is national_id really a natural key, or is it someone else's synthetic key? If so, should the owner of that database have opted for a natural key rather than a synthetic key?

More arguments for synthetic over natural keys: https://blog.ploeh.dk/2024/06/03/youll-regret-using-natural-...

replies(3): >>44026597 #>>44026611 #>>44027242 #
rawgabbit ◴[] No.44026611[source]
I was going to comment on this. Natural keys sound like a good idea and they should enforced maybe by using a unique constraint.

Natural keys are important. But the real world and the databases that represent them are messy. People’s identities get stolen. Data entry mistakes and integration between systems fail and leave the data in a schizophrenic state.

In my experience I find arguments about natural keys unproductive. I usually try to steer the conversation to the scenarios I mentioned above. Those who listen to me will have a combination of synthetic and natural keys. The first is used to represent system state. The second is used to represent business processes.

replies(2): >>44026657 #>>44030408 #
atomicnumber3 ◴[] No.44026657[source]
Natural keys are also all too often PII. A surrogate key that's just pure entropy is much safer to blast all over the place in logs and error messages and so on.
replies(1): >>44026695 #
1. rawgabbit ◴[] No.44026695{3}[source]
I usually encourage people to place all PII in a separate table. Only those who engage with customers e.g., verifying customers identities should have access. Furthermore images of customer identity cards are strictly forbidden. You can enter their passport number, name, address, birthdate etc. but copies of identity documents will make you a target of hackers and angry customers. The rep can ask the customer to show the document or in the worst case present a copy but the copy should immediately be deleted.
replies(2): >>44026911 #>>44030475 #
2. sroussey ◴[] No.44026911[source]
PII in a separate db. Encrypted like you would a credit card card number.

BTW: email+password should be separated too. An early draft of GDPR specifically mentioned that, though the final version got less into the weeds.

I’m sure if you vibe code any of this, it will all be plaintext, lol.

3. atomicnumber3 ◴[] No.44030475[source]
"I usually encourage people to place all PII in a separate table. Only those who engage with customers e.g., verifying customers identities should have access"

This sounds nice but usually falls apart fast. "separate table" is neat but access at the user-level is generally not implemented at the DB layer, so which table it is in is unrelated. Also IME data access is usually "everyone up to the role that actually 'needs' it gets it". So e.g. if customer support has access to something, generally so does every single engineering team in the middle. Which is generally a shitton more people than the people who designed the access control mechanisms probably imagined as they bothered adding all this granularity.

Realistically, I think the threat model needs to be looked at from the other side: who's most likely to accidentally leak the data? Is it a support person having their laptop stolen? An engineer getting phished? An engineer accidentally sending Splunk PII in logs? How you address the actual threats your data faces often look almost completely unrelated to what you'd build if you sat down and said "ok big boss says we have to secure the data. what did he mean by this."

I do agree about not holding on to data you don't actually need tho.