Postgres UUIDv7 and per-back end monotonicity

(brandur.org)

1. kingkilr ◴[02 Jan 25 17:05 UTC] No.42576212[source]▶

I would strongly implore people not to follow the example this post suggests, and write code that relies on this monotonicity.

The reason for this is simple: the documentation doesn't promise this property. Moreover, even if it did, the RFC for UUIDv7 doesn't promise this property. If you decide to depend on it, you're setting yourself up for a bad time when PostgreSQL decides to change their implementation strategy, or you move to a different database.

Further, the stated motivations for this, to slightly simplify testing code, are massively under-motivating. Saving a single line of code can hardly be said to be worth it, but even if it were, this is a problem far better solved by simply writing a function that will both generate the objects and sort them.

As a profession, I strongly feel we need to do a better job orienting ourselves to the reality that our code has a tendency to live for a long time, and we need to optimize not for "how quickly can I type it", but "what will this code cost over its lifetime".

replies(9): >>42576251 #>>42576272 #>>42576300 #>>42576495 #>>42576752 #>>42576906 #>>42576998 #>>42586804 #>>42589145 #

2. 3eb7988a1663 ◴[02 Jan 25 17:09 UTC] No.42576251[source]▶

>>42576212 (TP) #

I too am missing the win on this. It is breaking the spec, and does not seem like it offers a significant advantage. In the eventual event where you have a collection of UUID7 you are only ever going to be able to rely on the millisecond precision anyway.

replies(2): >>42576316 #>>42576582 #

3. peterldowns ◴[02 Jan 25 17:11 UTC] No.42576272[source]▶

>>42576212 (TP) #

The test should do a set comparison, not an ordered list comparison, if it wants to check that the same 5 accounts were returned by the API. I think it's as simple as that.

The blogpost is interesting and I appreciated learning the details of how the UUIDv7 implementation works.

replies(1): >>42576341 #

4. paulddraper ◴[02 Jan 25 17:14 UTC] No.42576300[source]▶

>>42576212 (TP) #

> Moreover, even if it did, the RFC for UUIDv7 doesn't promise this property.

Huh?

If the docs were to guarantee it, they guarantee it. Why are you looking for everything to be part of RFC UUIDv7?

Failure of logic.

replies(1): >>42577060 #

5. sbuttgereit ◴[02 Jan 25 17:16 UTC] No.42576316[source]▶

>>42576251 #

You say it's breaking the spec, but is it?

From https://www.rfc-editor.org/rfc/rfc9562.html#name-uuid-versio...:

"UUIDv7 values are created by allocating a Unix timestamp in milliseconds in the most significant 48 bits and filling the remaining 74 bits, excluding the required version and variant bits, with random bits for each new UUIDv7 generated to provide uniqueness as per Section 6.9. Alternatively, implementations MAY fill the 74 bits, jointly, with a combination of the following subfields, in this order from the most significant bits to the least, to guarantee additional monotonicity within a millisecond:

   1.  An OPTIONAL sub-millisecond timestamp fraction (12 bits at
       maximum) as per Section 6.2 (Method 3).

   2.  An OPTIONAL carefully seeded counter as per Section 6.2 (Method 1
       or 2).

   3.  Random data for each new UUIDv7 generated for any remaining
       space."

Which the referenced "method 3" is:

"Replace Leftmost Random Bits with Increased Clock Precision (Method 3):

For UUIDv7, which has millisecond timestamp precision, it is possible to use additional clock precision available on the system to substitute for up to 12 random bits immediately following the timestamp. This can provide values that are time ordered with sub-millisecond precision, using however many bits are appropriate in the implementation environment. With this method, the additional time precision bits MUST follow the timestamp as the next available bit in the rand_a field for UUIDv7."

6. vips7L ◴[02 Jan 25 17:18 UTC] No.42576341[source]▶

>>42576272 #

Don’t you think that depends on what you’re guaranteeing in your api? If you’re guaranteeing that your api returns the accounts ordered you need to test for that. But I do agree in general that using a set is the correct move.

replies(1): >>42576711 #

7. throw0101c ◴[02 Jan 25 17:32 UTC] No.42576495[source]▶

>>42576212 (TP) #

> […] code that relies on this monotonicity. The reason for this is simple: the documentation doesn't promise this property. Moreover, even if it did, the RFC for UUIDv7 doesn't promise this property.

The "RFC for UUIDv7", RFC 9562, explicitly mentions monotonicity in §6.2 ("Monotonicity and Counters"):

    Monotonicity (each subsequent value being greater than the last) is 
    the backbone of time-based sortable UUIDs. Normally, time-based UUIDs 
    from this document will be monotonic due to an embedded timestamp; 
    however, implementations can guarantee additional monotonicity via 
    the concepts covered in this section.

* https://datatracker.ietf.org/doc/html/rfc9562#name-monotonic...

In the UUIDv7 definition (§5.7) it explicitly mentions the technique that Postgres employs for rand_a:

    rand_a:
        12 bits of pseudorandom data to provide uniqueness as per
        Section 6.9 and/or optional constructs to guarantee additional 
        monotonicity as per Section 6.2. Occupies bits 52 through 63 
        (octets 6-7).

* https://datatracker.ietf.org/doc/html/rfc9562#name-uuid-vers...

Note: "optional constructs to guarantee additional monotonicity". Pg makes use of that option.

replies(1): >>42576599 #

8. throw0101c ◴[02 Jan 25 17:40 UTC] No.42576582[source]▶

>>42576251 #

> It is breaking the spec […]

As per a sibling comment, it is not breaking the spec. The comment in the Pg code even cites the spec that says what to do (and is quoted in the post):

     * Generate UUID version 7 per RFC 9562, with the given timestamp.
     *
     * UUID version 7 consists of a Unix timestamp in milliseconds (48
     * bits) and 74 random bits, excluding the required version and
     * variant bits. To ensure monotonicity in scenarios of high-
     * frequency UUID generation, we employ the method "Replace
     * LeftmostRandom Bits with Increased Clock Precision (Method 3)",
     * described in the RFC. […]

9. stonemetal12 ◴[02 Jan 25 17:42 UTC] No.42576599[source]▶

>>42576495 #

>explicitly mentions monotonicity

>optional constructs

So it is explicitly mentioned in the RFC as optional, and Pg doesn't state that they guaranty that option. The point still stands, depending on optional behavior is a recipe for failure when the option is no longer taken.

replies(6): >>42576845 #>>42576990 #>>42577105 #>>42577134 #>>42577202 #>>42579068 #

10. Too ◴[02 Jan 25 17:50 UTC] No.42576711{3}[source]▶

>>42576341 #

The test is a very strange example indeed. Is it testing the backend, the database or both? If the api was guaranteeing ordered values, pre-uuid7 the backend must have sorted them by other means before returning, making the test identical. If the backend is not guaranteeing order, that shouldn't be tested either.

11. braiamp ◴[02 Jan 25 17:54 UTC] No.42576752[source]▶

>>42576212 (TP) #

I don't think most people will heed this warning. I warned people in a programming forum that Python ordering of objects by insertion time was a implementation detail, because it's not guaranteed by any PEP [0]. I could literally write a PEP compliant Python interpreter and could blow up in someone's code because they rely on the CPython interpreter behavior.

[0]: https://mail.python.org/pipermail/python-dev/2017-December/1...

replies(2): >>42576959 #>>42576961 #

12. idconvict ◴[02 Jan 25 18:02 UTC] No.42576845{3}[source]▶

>>42576599 #

The "optional" portion is this part of the spec, not the time part.

> implementations can guarantee additional monotonicity via the concepts covered in this section

replies(1): >>42582910 #

13. sedatk ◴[02 Jan 25 18:06 UTC] No.42576906[source]▶

>>42576212 (TP) #

As a counter-argument, it will inevitably turn into a spec if it becomes widely-used enough.

What was that saying, like: “every behavior of software eventually becomes API”

replies(2): >>42576997 #>>42577344 #

14. kstrauser ◴[02 Jan 25 18:10 UTC] No.42576959[source]▶

>>42576752 #

That definitely was true, and I use to jitter my code a little to deliberately find and break tests that depended on any particular ordering.

It's now explicitly documented to be true, and you can officially rely on it. From https://docs.python.org/3/library/stdtypes.html#dict:

> Changed in version 3.7: Dictionary order is guaranteed to be insertion order.

That link documents the Python language's semantics, not the behavior of any particular interpreter.

15. dragonwriter ◴[02 Jan 25 18:10 UTC] No.42576961[source]▶

>>42576752 #

> I warned people in a programming forum that Python ordering of objects by insertion time was a implementation detail, because it's not guaranteed by any PEP

PEPs do not provide a spec for Python, they neither cover the initial base language before the PEP process started, nor were all subsequent language changes made through PEPs. The closest thing Python has to a cross-implementation standard is the Python Language Reference for a particular version, treating as excluded anything explicitly noted as a CPython implementation detail. Dictionaries being insertion-ordered went from a CPython implementation detail in 3.6 to guaranteed language feature in 3.7+.

16. arghwhat ◴[02 Jan 25 18:12 UTC] No.42576990{3}[source]▶

>>42576599 #

Relying on an explicitly documented implementation behavior that the specification explicitly describes as an option is not an issue. Especially if the behavior is only relied on in a test, where the worst outcome is a failed testcase that is easily fixed.

Even if the behavior went away, UUIDs unlike serials can always be safely generated directly by the application just as well as they can be generated by the database.

Going straight for that would arguably be the "better" path, and allows mocking PRNG to get sequential IDs.

17. tomstuart ◴[02 Jan 25 18:13 UTC] No.42576997[source]▶

>>42576906 #

https://www.hyrumslaw.com/

replies(1): >>42578430 #

18. deadbabe ◴[02 Jan 25 18:13 UTC] No.42576998[source]▶

>>42576212 (TP) #

Most code does not live for a long time. Similar to how consumer products are built for planned obsolescence, code is also built with a specific lifespan in mind.

If you spend time making code bulletproof so it can run for like 100 years, you will have wasted a lot of effort for nothing when someone comes along and wipes it clean and replaces it with new code in 2 years. Requirements change, code changes, it’s the nature of business.

Remember any fool can build a bridge that stands, it takes an engineer to make a bridge that barely stands.

replies(3): >>42577137 #>>42577156 #>>42581141 #

19. fwip ◴[02 Jan 25 18:18 UTC] No.42577060[source]▶

>>42576300 #

Their next sentence explains. Other databases might not make that guarantee, including future versions of Postgres.

20. mlyle ◴[02 Jan 25 18:22 UTC] No.42577105{3}[source]▶

>>42576599 #

It's mentioned in the RFC as being explicitly monotonic based the time-based design.

Implementations that need monotonicity beyond the resolution of a timestamp-- like when you allocate 30 UUIDs at one instant in a batch-- can optionally use those additional bits for that purpose.

> Implementations SHOULD employ the following methods for single-node UUID implementations that require batch UUID creation or are otherwise concerned about monotonicity with high-frequency UUID generation.

(And it goes on to recommend the obvious things you'd do: use a counter in those bits when assigning a batch; use more bits of time precision; etc.)

The comment in PostgreSQL before the implementation makes it clear that they chose the third option for this in the RFC:

     * variant bits. To ensure monotonicity in scenarios of high-
     * frequency UUID generation, we employ the method "Replace
     * LeftmostRandom Bits with Increased Clock Precision (Method 3)",
     * described in the RFC. ...

replies(1): >>42579731 #

21. sbuttgereit ◴[02 Jan 25 18:24 UTC] No.42577134{3}[source]▶

>>42576599 #

Software is arbitrary. Any so-called "guarantee" is only as good as the developers and organizations maintaining a piece of software want to make it regardless of prior statements. At some point, the practical likelihood of a documented, but not guaranteed, process being violated vs. the willful abandonment of a guarantee start to look very similar.... at which point nothing saves you.

Sometimes the best you can do is recognize who you're working with today, know how they work, and be prepared for those people to be different in the future (or of a different mind) and for things to change regardless to expressed guarantees.

....unless we're talking about the laws of physics... ...that's different...

22. Pxtl ◴[02 Jan 25 18:24 UTC] No.42577137[source]▶

>>42576998 #

Uh, more people work on 20-year-old codebases than you'd think.

replies(1): >>42578908 #

23. agilob ◴[02 Jan 25 18:26 UTC] No.42577156[source]▶

>>42576998 #

>Most code does not live for a long time.

Sure, and here I am in a third company doing cloud migration and changing our default DB from MySQL to SQL server. The pain is real, 2 year long roadmap is now 5 years longer roadmap. All because some dude negotiated a discount on cloud services. And we still develop integrations that talk to systems written for DOS.

24. throw0101c ◴[02 Jan 25 18:30 UTC] No.42577202{3}[source]▶

>>42576599 #

> So it is explicitly mentioned in the RFC as optional […]

The use of rand_a for extra monotonicity is optional. The monotonicity itself is not optional.

§5.7 states:

    Alternatively, implementations MAY fill the 74 bits, 
    jointly, with a combination of the following subfields, 
    in this order from the most significant bits to the least, 
    to guarantee additional monotonicity within a millisecond:

Guaranteeing additional monotonicity means that there is already a 'base' level of monotonicity, and there are provisions for even more ("additional") levels of it. This 'base level' is why §6.2 states:

    Monotonicity (each subsequent value being greater than the last) is 
    the backbone of time-based sortable UUIDs. Normally, time-based UUIDs 
    from this document will be monotonic due to an embedded timestamp; 
    however, implementations can guarantee additional monotonicity via 
    the concepts covered in this section.

"Backbone of time-based sortable UUIDs"; "additional monotonicity". Additional: adding to what's already there.

* https://datatracker.ietf.org/doc/html/rfc9562

replies(3): >>42579633 #>>42579694 #>>42582047 #

25. the8472 ◴[02 Jan 25 18:43 UTC] No.42577344[source]▶

>>42576906 #

Consider the incentives you're setting up there. An API contract goes both ways, the vendor promises some things and not others to preserve flexibility, and the user has to abide by it to not get broken in the future. If you unilaterally ignore the contract, even plan to do so in advance, then eventually kindness and capacity to accommodate such abuse will run might run out and they may switch to an adversarial stance. See QUIC for example which is a big middle finger to middle boxes.

replies(1): >>42578450 #

26. sedatk ◴[02 Jan 25 20:24 UTC] No.42578430{3}[source]▶

>>42576997 #

Yes, that one! Thanks :)

27. sedatk ◴[02 Jan 25 20:25 UTC] No.42578450{3}[source]▶

>>42577344 #

Sure, there is a risk. But, it all depends on how great and desirable the benefits are.

28. 9dev ◴[02 Jan 25 21:06 UTC] No.42578908{3}[source]▶

>>42577137 #

And yet these people are dwarved by the number of developers crunching out generic line of business CRUD apps every day.

29. btown ◴[02 Jan 25 21:22 UTC] No.42579068{3}[source]▶

>>42576599 #

I was recently bit doing a Postgres upgrade by the Postgres team considering statements like `select 1 group by true` fine to silently break in Postgres 15. See https://postgrespro.com/list/thread-id/2661353 - and this behavior remains undocumented in https://www.postgresql.org/docs/release/ . It's an absolutely incredible project, and I don't disagree with the decision to classify it as wontfix - but it's an anecdote to not rely on undefined behavior!

30. reshlo ◴[02 Jan 25 22:26 UTC] No.42579633{4}[source]▶

>>42577202 #

> Normally, time-based UUIDs from this document will be monotonic due to an embedded timestamp; however, implementations can guarantee additional monotonicity via the concepts covered in this section.

“Normally, I am at home because I do not have a reason to go out; however, sometimes I am at home because I am sleeping.”

Notice how this statement does not actually mean that I am always at home.

31. Dylan16807 ◴[02 Jan 25 22:32 UTC] No.42579694{4}[source]▶

>>42577202 #

"this monotonicity" that OP suggests people not use is specifically the additional monotonicity.

Or to put it another way: OP is suggesting you don't depend on it being properly monotonic, because the default is that it is only partially monotonic.

32. Dylan16807 ◴[02 Jan 25 22:36 UTC] No.42579731{4}[source]▶

>>42577105 #

> It's mentioned in the RFC as being explicitly monotonic based the time-based design.

It's explicitly partially monotonic.

Or as other people would call it, "not monotonic".

People are talking past each other based on their use of the word "monotonic".

replies(2): >>42581561 #>>42585566 #

33. mardifoufs ◴[03 Jan 25 01:25 UTC] No.42581141[source]▶

>>42576998 #

What? Okay, so assume that most code doesn't last. It doesn't mean that you should purposefully make it brittle for basically no additional benefit? If as you say, it's about making the most with as little as possible (which is what the bridge analogy usually refers to), then surely adding a single function (to actually enforce the ordering you want) to make your code more robust is one of the best examples of that?

34. mlyle ◴[03 Jan 25 02:23 UTC] No.42581561{5}[source]▶

>>42579731 #

It's explicitly monotonic, except for apps that have a very fast ID rate, in which case there are recommended approaches (the word "SHOULD" is used in an RFC) to make it work. And PostgreSQL used one of these recommended approaches and documented it.

replies(1): >>42582207 #

35. Dylan16807 ◴[03 Jan 25 03:53 UTC] No.42582207{6}[source]▶

>>42581561 #

> It's explicitly monotonic, except for apps that have a very fast ID rate

"might generate two IDs in the same millisecond" is not a very exotic occurrence. It makes a big difference whether the rest is guaranteed or not.

> And PostgreSQL used one of these recommended approaches and documented it.

Well that's the center of the issue, right? OP's interpretation was that PostgreSQL did not document such, and so it shouldn't be relied upon. If it is a documented promise, then things are fine.

But is it actually in the documentation? A source code comment saying it uses a certain method isn't a promise that it will always use that method.

replies(2): >>42582895 #>>42582914 #

36. dragonwriter ◴[03 Jan 25 05:47 UTC] No.42582895{7}[source]▶

>>42582207 #

> Well that’s the center of the issue, right? OP’s interpretation was that PostgreSQL did not document such, and so it shouldn’t be relied upon.

And the correct answer is…we don’t know. We have a commit that landed and the explanation of the commit; we don’t have the documentation for the corresponding release of Postgres, because…it doesn’t exist yet. Because monoticity is an important feature for UUIDv7, it would be very odd if Postgres used an implementation that took the extra effort to use a nanosecond-level time value as the high-order portion of the variant part of the UUID instead of the minimum millisecond-level time, but not document that, but any assumption about what will be the documented, reliable-going-forward advertised feature is speculative until the documentation exists and is finalized.

OTOH, its perfectly fine to talk about what the implementation allows now, because that kind of thing is important to the decision about what should be documented and committed to going forward.

37. dragonwriter ◴[03 Jan 25 05:50 UTC] No.42582910{4}[source]▶

>>42576845 #

The “time part” is actually two different parts: the required millisecond-level ordering and the optional use of part of rand_a (which postgres does) to provide higher-resolution (nanosecond, in the postgres case) time ordering when combined with the required portion.

So, no, the “time part” of the postgres implementation is, in part, one of the options discussed in the spec, not merely the “time part” required in the spec.

38. pests ◴[03 Jan 25 05:50 UTC] No.42582914{7}[source]▶

>>42582207 #

The point of the extra bits is to allow the application developer to keep monotonicity in the "not very exotic occurrence" scenario. The purpose is to be monotonic. I feel like you are missing the core concept.

replies(1): >>42583278 #

39. Dylan16807 ◴[03 Jan 25 07:03 UTC] No.42583278{8}[source]▶

>>42582914 #

I'm not missing anything, the problem is a lot of people using sloppy wording or mixing up the two modes.

This comment thread is about the guaranteed level of monotonicity. Yes, those bits exist. But you can't depend on them from something that only promises "UUIDv7". You need an additional promise that it's configured that way and actually using those bits to maintain monotonicity.

40. throw0101c ◴[03 Jan 25 13:42 UTC] No.42585566{5}[source]▶

>>42579731 #

> It's explicitly partially monotonic.

From the RFC:

    Monotonicity (each subsequent value being greater than the last) is 
    the backbone of time-based sortable UUIDs. Normally, time-based UUIDs 
    from this document will be monotonic due to an embedded timestamp

"time-based UUIDs from this document will be monotonic". "will be monotonic".

I'm not sure how much more explicit this can be made.

The intent of UUIDv7 is monotonicity. If an implementation is not monotonic then it's a bug in the implementation.

41. drbojingle ◴[03 Jan 25 16:08 UTC] No.42586804[source]▶

>>42576212 (TP) #

In enterprise land. In proof of concept land that's not quite true (but does become true if the concept works)

42. StackTopherFlow ◴[03 Jan 25 20:25 UTC] No.42589145[source]▶

>>42576212 (TP) #

I agree, optimizing for readability and maintainability is almost always the right choice.

↑