Postgres UUIDv7 and per-back end monotonicity

(brandur.org)

230 points craigkerstiens | 3 comments | 02 Jan 25 16:32 UTC | HN request time: 0.646s | source

Show context

urronglol ◴[02 Jan 25 16:49 UTC] No.42576069[source]▶

What is a v7 UUID. Why do we need more than 1. uuid from a random seed and 2. one derived from that and a timestamp (orderable)

replies(6): >>42576084 #>>42576085 #>>42576155 #>>42576193 #>>42576274 #>>42576349 #

n2d4 ◴[02 Jan 25 16:52 UTC] No.42576084[source]▶

>>42576069 #

UUID v7 is the latter, whereas v4 is the former.

All the other versions are somewhat legacy, and you shouldn't use them in new systems (besides v8, which is "custom format UUID", if you need that.)

replies(1): >>42576174 #

elehack ◴[02 Jan 25 17:01 UTC] No.42576174[source]▶

>>42576084 #

UUID v5 is quite useful if you want to deterministically convert external identifiers into UUIDS — define a namespace UUID for each potential identifier source (to keep them separate), then use that to derive a V5 UUID from the external identifier. It's very useful for idempotent data imports.

replies(1): >>42577132 #

jandrewrogers ◴[02 Jan 25 18:24 UTC] No.42577132[source]▶

>>42576174 #

Both UUIDv3 and UUIDv5 are prohibited for some use cases in some countries (including the US), which is something to be aware of. Unfortunately, no one has created an updated standard UUID that uses a hash function that is not broken. While useful it is not always an option.

replies(1): >>42585785 #

1. kbolino ◴[03 Jan 25 14:11 UTC] No.42585785[source]▶

>>42577132 #

Could you provide an example of such a prohibition? I've never heard of that before.

I doubt that the quality of the hash function is the real issue. The problem with MD5 and SHA1 is that it's easy (for MD5) and technically possible (for SHA1) to generate collisions. That makes them broken for enforcing message integrity. But a UUID is not an integrity check. Both MD5 and SHA1 are still very good as non-cryptographic hash functions. While a hash-based UUID provides obfuscation, it isn't really a security mechanism.

Even the existence of UUIDv5 feels like a knee-jerk reaction from when MD5 was "bad" but SHA1 was still "good". No hash function will protect you against de-obfuscation of low-entropy inputs. I can feed your social security number through SHA3-512 but it's not going to make it any less guessable than if I fed it through MD5.

Moreover, a UUID only has 122 bits of usable space. Even if we defined a new SHA2- or SHA3-based UUID version, it's still going to have to truncate the hash output to less than half of its full size. This significantly alters the security properties of the hash function, though I'm not sure if much cryptanalysis has been done on the shorter forms to see if they're more practically breakable yet.

There is one area where the collision resistance of the hash function could be a concern, though. If all of the inputs to the hash are under the control of a potential attacker, then maliciously constructed data could produce the same UUID. I still wouldn't think this would be a major issue, since most databases will fail to insert a duplicate key, but it might allow for various denial of service attacks. This still feels like quite a niche risk, though, and very circumstance-dependent.

replies(1): >>42593244 #

2. jandrewrogers ◴[04 Jan 25 07:51 UTC] No.42593244[source]▶

>>42585785 (TP) #

Systems where a sophisticated attacker may engineer collisions are precisely why UUIDv3/5 are prohibited. SHA1 is deemed broken by some government authorities and not to be used in any critical systems, including as UUID (this is where I’ve seen it expressly prohibited). The entire point of UUIDs in many systems is that collisions should be impossible, system integrity is predicated on it. Many systems exist in a presumptively adversarial environment.

Similarly, UUIDv4 is also prohibited in many contexts because people using weak entropy sources has been a recurring problem in real systems. It isn’t a theoretical issue, it has actually happened repeatedly. Decentralized generation of UUIDv4 is not trusted because humans struggle to implement it correctly, causing collisions where none are expected.

There are also contexts where probabilistic collision resistance is disallowed because collision probabilities, while low, are high enough to be theoretically plausible. Most people aren’t working on systems this large yet.

Ironically, there are many reasonable ways to construct reasonable and secure 128-bit identity values but the standards don’t define one. Some flavor of deterministic generation + encryption are not uncommon but they are also non-standard.

That said, many companies unavoidably have a mix of standard and non-standard UUIDs internally. To mitigate collisions, they have to transform those UUIDs into something else UUID-like, at which point it is pretty much guaranteed to be non-standard. Not ideal but that is the world we live in.

replies(1): >>42602563 #

3. kbolino ◴[05 Jan 25 15:52 UTC] No.42602563[source]▶

>>42593244 #

Ok, that makes sense. As far as I can tell, even truncated to "just" 122 bits, there's still no known way to generate a SHA-256 collision, so the MD5/SHA1 versions are comparatively vulnerable vs an hypothetical SHA256 UUID version. However, it's starting to feel like UUIDs may not be long enough in general to meet the need for secure, distributed ID generation.

↑