Most active commenters

    ←back to thread

    121 points b-man | 17 comments | | HN request time: 0.897s | source | bottom
    1. mrkeen ◴[] No.44026549[source]
    > Principle of Essential Denotation (PED): A relation should be identified by a natural key that reflects the entity’s essential, domain-defined identity — not by arbitrary or surrogate values.

      create table citizen (
        national_id national_id primary key,
        full_name text);
    
    Is national_id really a natural key, or is it someone else's synthetic key? If so, should the owner of that database have opted for a natural key rather than a synthetic key?

    More arguments for synthetic over natural keys: https://blog.ploeh.dk/2024/06/03/youll-regret-using-natural-...

    replies(3): >>44026597 #>>44026611 #>>44027242 #
    2. tuatoru ◴[] No.44026597[source]
    The "natural key" for a (natural) person is compound: full name and mother's full name, plus date, time and place of birth. Your birth certificate is your primary identification document.

    However that still runs into problems of nondurability of the key in cultures that delay naming for a few years. To name one problem.

    So yeah, use a big enough integer as your key, and have appropriate checks on groups of attributes like this.

    However, if you are only interested in citizens, then a "natural" key is the citizen id issued by the government department responsible for doing that. (Citizen is a role played by a natural person so probably doesn't have too many attributes.) I still wouldn't use that as a primary key on the citizen table, though.

    replies(6): >>44026790 #>>44027013 #>>44027124 #>>44027728 #>>44036602 #>>44052536 #
    3. rawgabbit ◴[] No.44026611[source]
    I was going to comment on this. Natural keys sound like a good idea and they should enforced maybe by using a unique constraint.

    Natural keys are important. But the real world and the databases that represent them are messy. People’s identities get stolen. Data entry mistakes and integration between systems fail and leave the data in a schizophrenic state.

    In my experience I find arguments about natural keys unproductive. I usually try to steer the conversation to the scenarios I mentioned above. Those who listen to me will have a combination of synthetic and natural keys. The first is used to represent system state. The second is used to represent business processes.

    replies(2): >>44026657 #>>44030408 #
    4. atomicnumber3 ◴[] No.44026657[source]
    Natural keys are also all too often PII. A surrogate key that's just pure entropy is much safer to blast all over the place in logs and error messages and so on.
    replies(1): >>44026695 #
    5. rawgabbit ◴[] No.44026695{3}[source]
    I usually encourage people to place all PII in a separate table. Only those who engage with customers e.g., verifying customers identities should have access. Furthermore images of customer identity cards are strictly forbidden. You can enter their passport number, name, address, birthdate etc. but copies of identity documents will make you a target of hackers and angry customers. The rep can ask the customer to show the document or in the worst case present a copy but the copy should immediately be deleted.
    replies(2): >>44026911 #>>44030475 #
    6. bouke ◴[] No.44026790[source]
    That natural key isn’t guaranteed to be unique.
    replies(1): >>44036605 #
    7. sroussey ◴[] No.44026911{4}[source]
    PII in a separate db. Encrypted like you would a credit card card number.

    BTW: email+password should be separated too. An early draft of GDPR specifically mentioned that, though the final version got less into the weeds.

    I’m sure if you vibe code any of this, it will all be plaintext, lol.

    8. stevoski ◴[] No.44027013[source]
    I know someone who doesn’t know when she was born, nor who her mother is.

    She doesn’t have a birth certificate.

    She was born in a country that was enduring several years of brutal war.

    I know another person whose national ID was changed. Systems that use national ID as primary key failed to accept this change.

    9. anon7000 ◴[] No.44027124[source]
    https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-...
    10. izacus ◴[] No.44027242[source]
    Programming software that accepts a national ID in our woods is usually the trial by fire for junior developers when they find out that "unique national ID" isn't actually unique for many historic reasons :)
    11. globular-toast ◴[] No.44027728[source]
    > The "natural key" for a (natural) person is compound: full name and mother's full name, plus date, time and place of birth.

    All it would take for a Bobby Tables[0] moment is a mother to have twins delivered by caesarean and given the same name... The same name thing has already been done[1].

    [0] https://xkcd.com/327/

    [1] https://www.walesonline.co.uk/news/real-life/mum-gives-twin-...

    12. arkh ◴[] No.44030408[source]
    Natural key for people: let's use your DNA map.
    replies(1): >>44087669 #
    13. atomicnumber3 ◴[] No.44030475{4}[source]
    "I usually encourage people to place all PII in a separate table. Only those who engage with customers e.g., verifying customers identities should have access"

    This sounds nice but usually falls apart fast. "separate table" is neat but access at the user-level is generally not implemented at the DB layer, so which table it is in is unrelated. Also IME data access is usually "everyone up to the role that actually 'needs' it gets it". So e.g. if customer support has access to something, generally so does every single engineering team in the middle. Which is generally a shitton more people than the people who designed the access control mechanisms probably imagined as they bothered adding all this granularity.

    Realistically, I think the threat model needs to be looked at from the other side: who's most likely to accidentally leak the data? Is it a support person having their laptop stolen? An engineer getting phished? An engineer accidentally sending Splunk PII in logs? How you address the actual threats your data faces often look almost completely unrelated to what you'd build if you sat down and said "ok big boss says we have to secure the data. what did he mean by this."

    I do agree about not holding on to data you don't actually need tho.

    14. b-man ◴[] No.44036602[source]
    > I still wouldn't use that as a primary key on the citizen table, though.

    Why not?

    15. b-man ◴[] No.44036605{3}[source]
    Depends on the universe of discourse adopted.
    16. HelloNurse ◴[] No.44052536[source]
    We need a new term of art: "naive key".
    17. mrankin ◴[] No.44087669{3}[source]
    It wasn’t me. I wasn’t there. It must have been my evil twin brother. — Thomas Dolby