Postgres UUIDv7 and per-back end monotonicity

(brandur.org)

Show context

kingkilr ◴[02 Jan 25 17:05 UTC] No.42576212[source]▶

I would strongly implore people not to follow the example this post suggests, and write code that relies on this monotonicity.

The reason for this is simple: the documentation doesn't promise this property. Moreover, even if it did, the RFC for UUIDv7 doesn't promise this property. If you decide to depend on it, you're setting yourself up for a bad time when PostgreSQL decides to change their implementation strategy, or you move to a different database.

Further, the stated motivations for this, to slightly simplify testing code, are massively under-motivating. Saving a single line of code can hardly be said to be worth it, but even if it were, this is a problem far better solved by simply writing a function that will both generate the objects and sort them.

As a profession, I strongly feel we need to do a better job orienting ourselves to the reality that our code has a tendency to live for a long time, and we need to optimize not for "how quickly can I type it", but "what will this code cost over its lifetime".

replies(9): >>42576251 #>>42576272 #>>42576300 #>>42576495 #>>42576752 #>>42576906 #>>42576998 #>>42586804 #>>42589145 #

throw0101c ◴[02 Jan 25 17:32 UTC] No.42576495[source]▶

>>42576212 #

> […] code that relies on this monotonicity. The reason for this is simple: the documentation doesn't promise this property. Moreover, even if it did, the RFC for UUIDv7 doesn't promise this property.

The "RFC for UUIDv7", RFC 9562, explicitly mentions monotonicity in §6.2 ("Monotonicity and Counters"):

    Monotonicity (each subsequent value being greater than the last) is 
    the backbone of time-based sortable UUIDs. Normally, time-based UUIDs 
    from this document will be monotonic due to an embedded timestamp; 
    however, implementations can guarantee additional monotonicity via 
    the concepts covered in this section.

* https://datatracker.ietf.org/doc/html/rfc9562#name-monotonic...

In the UUIDv7 definition (§5.7) it explicitly mentions the technique that Postgres employs for rand_a:

    rand_a:
        12 bits of pseudorandom data to provide uniqueness as per
        Section 6.9 and/or optional constructs to guarantee additional 
        monotonicity as per Section 6.2. Occupies bits 52 through 63 
        (octets 6-7).

* https://datatracker.ietf.org/doc/html/rfc9562#name-uuid-vers...

Note: "optional constructs to guarantee additional monotonicity". Pg makes use of that option.

replies(1): >>42576599 #

stonemetal12 ◴[02 Jan 25 17:42 UTC] No.42576599[source]▶

>>42576495 #

>explicitly mentions monotonicity

>optional constructs

So it is explicitly mentioned in the RFC as optional, and Pg doesn't state that they guaranty that option. The point still stands, depending on optional behavior is a recipe for failure when the option is no longer taken.

replies(6): >>42576845 #>>42576990 #>>42577105 #>>42577134 #>>42577202 #>>42579068 #

mlyle ◴[02 Jan 25 18:22 UTC] No.42577105[source]▶

>>42576599 #

It's mentioned in the RFC as being explicitly monotonic based the time-based design.

Implementations that need monotonicity beyond the resolution of a timestamp-- like when you allocate 30 UUIDs at one instant in a batch-- can optionally use those additional bits for that purpose.

> Implementations SHOULD employ the following methods for single-node UUID implementations that require batch UUID creation or are otherwise concerned about monotonicity with high-frequency UUID generation.

(And it goes on to recommend the obvious things you'd do: use a counter in those bits when assigning a batch; use more bits of time precision; etc.)

The comment in PostgreSQL before the implementation makes it clear that they chose the third option for this in the RFC:

     * variant bits. To ensure monotonicity in scenarios of high-
     * frequency UUID generation, we employ the method "Replace
     * LeftmostRandom Bits with Increased Clock Precision (Method 3)",
     * described in the RFC. ...

replies(1): >>42579731 #

1. Dylan16807 ◴[02 Jan 25 22:36 UTC] No.42579731[source]▶

>>42577105 #

> It's mentioned in the RFC as being explicitly monotonic based the time-based design.

It's explicitly partially monotonic.

Or as other people would call it, "not monotonic".

People are talking past each other based on their use of the word "monotonic".

replies(2): >>42581561 #>>42585566 #

2. mlyle ◴[03 Jan 25 02:23 UTC] No.42581561[source]▶

>>42579731 (TP) #

It's explicitly monotonic, except for apps that have a very fast ID rate, in which case there are recommended approaches (the word "SHOULD" is used in an RFC) to make it work. And PostgreSQL used one of these recommended approaches and documented it.

replies(1): >>42582207 #

3. Dylan16807 ◴[03 Jan 25 03:53 UTC] No.42582207[source]▶

>>42581561 #

> It's explicitly monotonic, except for apps that have a very fast ID rate

"might generate two IDs in the same millisecond" is not a very exotic occurrence. It makes a big difference whether the rest is guaranteed or not.

> And PostgreSQL used one of these recommended approaches and documented it.

Well that's the center of the issue, right? OP's interpretation was that PostgreSQL did not document such, and so it shouldn't be relied upon. If it is a documented promise, then things are fine.

But is it actually in the documentation? A source code comment saying it uses a certain method isn't a promise that it will always use that method.

replies(2): >>42582895 #>>42582914 #

4. dragonwriter ◴[03 Jan 25 05:47 UTC] No.42582895{3}[source]▶

>>42582207 #

> Well that’s the center of the issue, right? OP’s interpretation was that PostgreSQL did not document such, and so it shouldn’t be relied upon.

And the correct answer is…we don’t know. We have a commit that landed and the explanation of the commit; we don’t have the documentation for the corresponding release of Postgres, because…it doesn’t exist yet. Because monoticity is an important feature for UUIDv7, it would be very odd if Postgres used an implementation that took the extra effort to use a nanosecond-level time value as the high-order portion of the variant part of the UUID instead of the minimum millisecond-level time, but not document that, but any assumption about what will be the documented, reliable-going-forward advertised feature is speculative until the documentation exists and is finalized.

OTOH, its perfectly fine to talk about what the implementation allows now, because that kind of thing is important to the decision about what should be documented and committed to going forward.

5. pests ◴[03 Jan 25 05:50 UTC] No.42582914{3}[source]▶

>>42582207 #

The point of the extra bits is to allow the application developer to keep monotonicity in the "not very exotic occurrence" scenario. The purpose is to be monotonic. I feel like you are missing the core concept.

replies(1): >>42583278 #

6. Dylan16807 ◴[03 Jan 25 07:03 UTC] No.42583278{4}[source]▶

>>42582914 #

I'm not missing anything, the problem is a lot of people using sloppy wording or mixing up the two modes.

This comment thread is about the guaranteed level of monotonicity. Yes, those bits exist. But you can't depend on them from something that only promises "UUIDv7". You need an additional promise that it's configured that way and actually using those bits to maintain monotonicity.

7. throw0101c ◴[03 Jan 25 13:42 UTC] No.42585566[source]▶

>>42579731 (TP) #

> It's explicitly partially monotonic.

From the RFC:

    Monotonicity (each subsequent value being greater than the last) is 
    the backbone of time-based sortable UUIDs. Normally, time-based UUIDs 
    from this document will be monotonic due to an embedded timestamp

"time-based UUIDs from this document will be monotonic". "will be monotonic".

I'm not sure how much more explicit this can be made.

The intent of UUIDv7 is monotonicity. If an implementation is not monotonic then it's a bug in the implementation.

↑