Postgres UUIDv7 and per-back end monotonicity

(brandur.org)

230 points craigkerstiens | 3 comments | 02 Jan 25 16:32 UTC | HN request time: 0.001s | source

Show context

fngjdflmdflg ◴[02 Jan 25 17:09 UTC] No.42576248[source]▶

>The Postgres patch solves the problem by repurposing 12 bits of the UUID’s random component to increase the precision of the timestamp down to nanosecond granularity [...]

>It makes a repeated UUID between processes more likely, but there’s still 62 bits of randomness left to make use of, so collisions remain vastly unlikely.

Does it? Even though the number of random bits has decreased, the time interval to create such a duplicate has also decreased, namely to an interval of one nanosecond.

replies(3): >>42576319 #>>42576330 #>>42577456 #

londons_explore ◴[02 Jan 25 17:17 UTC] No.42576330[source]▶

>>42576248 #

I could imagine that certain nanoseconds might be vastly more likely than other nanoseconds.

For example, imagine you have a router that sends network packets out at the start of each microsecond, synced to wall time.

Or the OS scheduler always wakes processes up on a millisecond timer tick or some polling loop.

Now, when those packets are received by a postgres server and processed, the time to do that is probably fairly consistent - meaning that X nanoseconds past the microsecond you probably get most records being created.

replies(1): >>42576494 #

UltraSane ◴[02 Jan 25 17:32 UTC] No.42576494[source]▶

>>42576330 #

But only one nanosecond slower or faster and you get another set of 4.611 billion billion random IDs. I think random variations in buffer depths and CPU speeds will easily introduce hundreds of nanoseconds of timing variations. syncing any two things to less than 1 nanosecond is incredibly hard and doesn't happen by accident.

replies(3): >>42576585 #>>42577223 #>>42593338 #

zamadatix ◴[02 Jan 25 17:41 UTC] No.42576585[source]▶

>>42576494 #

The important part is the events in time aren't going to be as random as the actual random source. The chances of an actual collision remain low but the distribution of events over time is a weaker (in relative terms) source of random bits compared to proper "random" sources which won't have obvious bias at all.

replies(1): >>42577252 #

1. UltraSane ◴[02 Jan 25 18:34 UTC] No.42577252{3}[source]▶

>>42576585 #

I am sure there is bias but 1 nanosecond is an incredibly narrow window. It really would be an interesting experiment to evaluate the optimal balance of bits for timestamp and for random value. What about hostname and even process ID? Snowflake IDs are 63 bits long with 41 bits as a millisecond timestamp, 10 bits as a machine ID, and 12 bits as a sequential counter.

replies(2): >>42578585 #>>42579410 #

2. WorldMaker ◴[02 Jan 25 20:38 UTC] No.42578585[source]▶

>>42577252 (TP) #

Similarly for direct comparison, ULID has 48-bit timestamps, also at the millisecond, and 80 random bits.

Also to compare, the ULID spec technique for monotonicity is to take a single random value and then start incrementing the lowest bits, trading random entropy for direct "nearness", one after another. Versus the rand_a approach is effectively using the most significant bits, but keeping more random entropy.

3. zamadatix ◴[02 Jan 25 22:01 UTC] No.42579410[source]▶

>>42577252 (TP) #

I suppose that would depend entirely on how you measure what optimal is. Optimal randomness is 128 bits from the best random source and 0 bits from anything else, like time. Optimal "just random enough for my use case but no more so I can fit other information in the value" depends entirely on the requirement of your use case (more specifically, not just "for databases" but "for my database to... on the hardware... in which the access is... on the presumed growth..." and so on). For picking a "good enough" value 12 bits is probably as reasonable as one will find generic reason for.

replies(1): >>42582283 #

↑