Most active commenters

Jarwain(7)
jandrewrogers(4)

Popular/hot comments

>>44026928 #
>>44027253 #

←back to thread

The principles of database design, or, the Truth is out there

(ebellani.github.io)

Show context

adamcharnock ◴[19 May 25 04:34 UTC] No.44026525[source]▶

>>44026201 (OP) #

> A relation should be identified by a natural key that reflects the entity’s essential, domain-defined identity — not by arbitrary or surrogate values.

I fairly strongly disagree with this. Database identifiers have to serve a lot of purposes, and natural key almost certainly isn’t ideal. Off the top my head, IDs can be used for:

- Joins, lookups, indexes. Here data type can matter regarding performance and resource use.

- Idempotency. Allowing a client to generate IDs can be a big help here (ie UUIDs)

- Sharing. You may want to share a URL to something that requires the key, but not expose domain data (a URL to a user’s profile image shouldn’t expose their national ID).

There is not one solution that handles all of these well. But using natural keys is one of the least good options.

Also, we all know that stakeholders will absolutely swear that there will never be two people with the same national ID. Oh, except unless someone died, then we may reuse their ID. Oh, and sometimes this remote territory has duplicate IDs with the mainland. Oh, and for people born during that revolution 50 years ago, we just kinda had to make stuff up for them.

So ideally I’d put a unique index on the national ID column. But realistically, it would be no unique constraint and instead form validation + a warning on anytime someone opened a screen for a user with a non-unique ID.

Then maybe a BIGINT for database ID, and a UUID4/7 for exposing to the world.

EDIT: Actually, the article is proposing a new principle. And so perhaps this could indeed be a viable one. And my comment above would describe situations where it is valid to break the principle. But I also suspect that this is so rarely a good idea that it shouldn’t be the default choice.

replies(12): >>44026711 #>>44026778 #>>44026822 #>>44026881 #>>44027090 #>>44027969 #>>44028025 #>>44029492 #>>44030454 #>>44030564 #>>44030578 #>>44031152 #

1. Jarwain ◴[19 May 25 05:28 UTC] No.44026711[source]▶

>>44026525 #

Why have both a database ID and UUIDv7, versus just a UUIDv7?

replies(2): >>44026865 #>>44026928 #

2. sroussey ◴[19 May 25 06:04 UTC] No.44026865[source]▶

>>44026711 (TP) #

There is a security principle to not expose real identifiers to the outside world. It makes a crack in your system easier to open.

replies(1): >>44027174 #

3. jandrewrogers ◴[19 May 25 06:16 UTC] No.44026928[source]▶

>>44026711 (TP) #

Actually, it should be a database ID and an encrypted database ID, which doesn’t require storing a second ID. Even better, you can make that key unique per session so that users can’t share keys. For security reasons, it is a bad idea to leak private state, which UUIDv7 does.

A single AES encryption block is the same size as a UUID and cheap to compute.

replies(4): >>44027083 #>>44027244 #>>44027288 #>>44064636 #

4. adamcharnock ◴[19 May 25 06:50 UTC] No.44027083[source]▶

>>44026928 #

> A single AES encryption block is the same size as a UUID and cheap to compute.

I didn’t realise this! The UUID spec mandates some values for specific digits, so I assume this would not be strictly valid UUIDs?

replies(2): >>44027207 #>>44065237 #

5. Jarwain ◴[19 May 25 07:05 UTC] No.44027174[source]▶

>>44026865 #

Idk that reeks of security through obscurity to me. Your authorization/permission scheme has got to be fubar'd if you're relying on obscuring IDs to prevent someone from accessing a resource they shouldn't.

I'm sure I'm missing something obvious, but I'm not sure what other threat vectors there are, assuming it's in conjunction with other layers of security like a solid authorization/access control scheme.

I guess I'm not the biggest fan for a few reasons. I'd rather try and design a system such that it's secure even in the case of database leak/compromise or some other form of high transparency. I don't want to encourage a culture of getting used to obscurity and potentially depending on it instead of making sure that transparency can't be abused.

Also, it just feels wasteful. If you have two distinct IDs for a given resource, what are you building your foreign keys against? If you build it against the hidden one, and want to query the foreign table based on user input, you've gotta either do a join or do a query to Get the hidden key just to do another query. It just feels wasteful.

EDIT: apparently RFC 4122 even says not to assume UUIDs are hard to guess and shouldn't be used for security capabilities. So if it shouldn't be depended on for security, why add all this complexity to keep it secure?

replies(1): >>44027253 #

6. jandrewrogers ◴[19 May 25 07:12 UTC] No.44027207{3}[source]▶

>>44027083 #

They would not be valid UUIDs, it is an opaque 128-bit value.

To be honest many companies are not using strictly standard conforming UUIDs everywhere, and UUID has become a synonym for opaque 128-bit identifier. Consumers of the value just need them to be unique. All standard UUID versions are sometimes prohibited in environments, except maybe UUIDv8, so this semantic ambiguity is helpful.

Technically, you could make it present as a standard UUIDv4 or UUIDv8 by setting a few bits, as long as you remember to add them back if you ever need to decrypt it. The entropy might be a bit off for a UUIDv4 if someone actually checks but you can guarantee the uniqueness.

Using AES to produce a UUID-like key is an old but useful trick. Both ARM and x86 do that encryption in hardware — it is cheaper to generate than the standardized UUID versions in most cases.

replies(1): >>44065257 #

7. Jarwain ◴[19 May 25 07:20 UTC] No.44027244[source]▶

>>44026928 #

So you've got a database ID, either serial or uuid? And you encrypt it when you send it to the user, maybe encrypted against their JWT or something to maintain stateless sessions?

And I guess if the user references specific resources by ID you'd have to translate back? Assuming the session has been maintained correctly,which I guess is a plus for stateful sessions. And it doesn't really matter if you get a collision on the encrypted output.

I spent enough time in the nibling comment talking about my doubts about that advice not to publicly share the identifying key. But I'll add one more point; it feels like a bunch of added complexity for marginal benefit.

8. jandrewrogers ◴[19 May 25 07:22 UTC] No.44027253{3}[source]▶

>>44027174 #

The point you may be missing is that the key itself contains information about records in the database that you don’t have access to. There are many famous examples in literature (e.g. the German Tank Problem [0]) of useful attacks on known unique serials to infer hidden information without access. In a database context, the keys for the records you are authorized to see can tell you much about the parts of the database to which you have no access.

Strong randomization of record serials mitigates these attacks.

[0] https://en.wikipedia.org/wiki/German_tank_problem

replies(3): >>44027310 #>>44030215 #>>44051982 #

9. lukevp ◴[19 May 25 07:27 UTC] No.44027288[source]▶

>>44026928 #

Can you explain this a bit more or link to something? I don’t really understand. What’s encrypted? A guid? A monotonic integer ID? Is the encrypted ID only used for user facing stuff? How is it decrypted? What do you gain by this?

replies(1): >>44031313 #

10. Jarwain ◴[19 May 25 07:30 UTC] No.44027310{4}[source]▶

>>44027253 #

I thought we were talking about UUIDv7, which is random enough to make this not a problem right?

replies(1): >>44032311 #

11. sgarland ◴[19 May 25 14:21 UTC] No.44030215{4}[source]▶

>>44027253 #

The only thing UUIDv7 exposes is its creation time, which isn't tremendously useful or secret information.

12. jandrewrogers ◴[19 May 25 16:06 UTC] No.44031313{3}[source]▶

>>44027288 #

I don’t have a link. I’ve never seen a good writeup but the practice is really old.

It is literally encrypting whatever type you are using as a handle for records that you send the user, typically an integer, a UUID, or some slightly more complex composite key. The only restriction is that the key should not be longer than 128-bits so that you can encrypt it to a UUID-like value; in practice you can almost always guarantee this [0]. The encryption only happens at the trust boundary i.e. when sending it to a random user. That encrypted key is not stored or used at all internally, it just changes the value of the key the user sees.

Most obvious ways of generating handles in a database leak private state. This is routinely exploited almost to the point of being a meme. Encrypting the handles you show the user prevents that.

An advantage of this is that if you are systematically encrypting exported keys then you can attach sensitive metadata to those keys at runtime if you wish, which can be very convenient. You have 128 bits to work with and a unique serial only needs 64 at most. If you are paranoid, there are schemes that enable some degree of key authentication. And while well beyond the scope here, similar constructions work nicely for compact keys in federate data models.

At scale, the cost (storage, compute, etc) of all of this matters. Encryption of keys, if done intelligently, is atypically efficient on all accounts while providing explicit inspectable security guarantees.

[0] There are reasons you might want to expand the exported key to a 256-bit value, particularly in high-assurance type environments, but the advantage of 128-bits is that it is literally drop-in compatible with UUIDs almost everywhere.

13. sroussey ◴[19 May 25 17:29 UTC] No.44032311{5}[source]▶

>>44027310 #

The idea being to expose uuid instead of the natural index.

It’s been downgraded as people use uuids more.

That said, security through obscurity is an effective layer, particularly for slowing an attack.

Slowing lateral movement is valuable.

replies(1): >>44032786 #

14. Jarwain ◴[19 May 25 18:04 UTC] No.44032786{6}[source]▶

>>44032311 #

Sorry I'm a bit confused.

I'm in agreement that a natural key shouldn't be used as the primary key for a record.

I was responding to a comment about having a hidden "database ID" (which I interpreted as being a serial key?) and a public "Uuid", and questioning the utility of that hidden database ID versus having a public UUIDv7 as the sole primary key, followed by questioning whether the utility of obscuring that primary UUIDv7 is worth the complexity of having to manage multiple artificial keys.

I agree that security through obscurity is a valuable layer in a multi-layered security position.

I guess I just don't think obscuring a Uuid primary key is worth the added complexity in most systems.

I see it like adding a second front door to your house with a separate set of keys. Sure it'd be more secure, but it's an added pain and doesn't help if you don't have a sturdy doorframe, or smash-resistant windows.

replies(1): >>44040377 #

15. ringeryless ◴[20 May 25 11:38 UTC] No.44040377{7}[source]▶

>>44032786 #

UUID is PUBLIC id, use it to look up the bigint numerical id, when necessary.

one should not divulge scale, placement in numerical sequence, etc wtr to integer id, hence ouvlic UUID, which is basically unguessable token

replies(1): >>44043117 #

16. Jarwain ◴[20 May 25 16:02 UTC] No.44043117{8}[source]▶

>>44040377 #

Why have a bigint numerical ID at all?

replies(1): >>44166886 #

17. HelloNurse ◴[21 May 25 14:41 UTC] No.44051982{4}[source]▶

>>44027253 #

In this case obscurity is security: leaked information from the database is directly useful and a secret in itself, not only a stepping stone that might or might not facilitate further exploitation.

18. sysysy ◴[22 May 25 17:48 UTC] No.44064636[source]▶

>>44026928 #

Which mode of AES do you mean? AES ECB seems to fit your example, but wouldn't it still expose the incremental nature UUID v7 with enough samples (see https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation, the Linux Tux image example)? Other AES modes produce way longer results, even when the ID is encoded with Base58 or the like.

19. ◴[22 May 25 18:38 UTC] No.44065237{3}[source]▶

>>44027083 #

20. sysysy ◴[22 May 25 18:39 UTC] No.44065257{4}[source]▶

>>44027207 #

21. ringeryless ◴[03 Jun 25 06:10 UTC] No.44166886{9}[source]▶

>>44043117 #

fast lookup for internal usage. it's significantly faster than a UUID on every DBMS i have ever seen

replies(1): >>44186281 #

22. Jarwain ◴[04 Jun 25 22:35 UTC] No.44186281{10}[source]▶

>>44166886 #

That's a good reason! I think with UUIDv7, because it's sequential it's indexes are faster than UUIDv4. Still larger than bigints though. I'd like to do a benchmark at some point

↑