Postgres UUIDv7 and per-back end monotonicity

1. dotdi ◴[02 Jan 25 17:13 UTC] No.42576293[source]▶

My org has been using ULID[0] extensively for a few years, and generally we've been quite happy with it. After initially dabbing with a few implementations, I reimplemented the spec in Kotlin, and this has been working out quite well for us. We will open-source our implementation in the following weeks.

ULID does specifically require generated IDs to be monotonically increasing as opposed to what the RFC for UUIDv7 states, which is a big deal IMHO.

[0]: https://github.com/ulid/spec

replies(2): >>42577689 #>>42578965 #

2. willvarfar ◴[02 Jan 25 19:15 UTC] No.42577689[source]▶

>>42576293 (TP) #

Having used a lot of the ULID variants that the UUIDv7 spec cites as prior art, including the ULID spec you link to, I've gotta say that UUIDv7 has some real advantages.

The biggest advantage is that it is hex. Haven't yet met a database system that doesn't have functions for substr and from_hex etc, meaning you can extract the time part using vanilla sql.

ULID and others that use custom variants of base32 or base62 or whatever are just about impossible to wrangle with normal tooling.

Your future selfs will thank you for being able to manipulate it in whatever database you use in the future to analyse old logs or import whatever data you generate today.

replies(2): >>42577865 #>>42579005 #

3. mixmastamyk ◴[02 Jan 25 19:33 UTC] No.42577865[source]▶

>>42577689 #

Aren't they stored as 16 bytes in binary? How to format it later as text is then your choice.

replies(1): >>42578939 #

4. WorldMaker ◴[02 Jan 25 21:09 UTC] No.42578939{3}[source]▶

>>42577865 #

It's that eternal push/pull "war" between "we need a sproc that can report this directly from the SQL server" and "please don't do things directly on the SQL server because you'll route around important application code" and "it's a feature not a bug that you can't just look things up by ID in the DB without a little bit of extra work".

I did work on a project using ULIDs in SQL Server. They were stored in uniqueidentifier fields with a complex byte swap from ULID to fake-UUID to get better storage/indexing out of SQL Server [1]. There was an attempt to use SQL Functions to display/search the ULID form directly in the database, but it was never as bug free as the C# byte order code and so it was definitely not recommended doing it directly in the DB and that if a "report" was missing it should be a part of the application (which was already almost nothing but a bloated "reporting" tool) or in a related "configuration" application. It did feel more like a feature than a bug because it did keep some meddling and drama out of the DB. I also see the arguments for why in some different types of applications it makes debugging a lot harder and those arguments make sense and it is definitely a trade-off to consider.

[1] The rabbit hole into SQL Server's ancient weird UUID sort order: https://devblogs.microsoft.com/oldnewthing/20190426-00/?p=10...

replies(1): >>42588706 #

5. sedatk ◴[02 Jan 25 21:11 UTC] No.42578965[source]▶

>>42576293 (TP) #

ULID guarantees monotonicity only per process, and it requires ID generation to be serialized. I find the promise quite misleading because of that. You might as well use a wide-enough integer with the current timestamp + random as baseline for the same purpose, but I wouldn't recommend that either.

6. sedatk ◴[02 Jan 25 21:15 UTC] No.42579005[source]▶

>>42577689 #

Additionally, v7 UUIDs can be generated simultaneously on the client-side by multiple threads without waiting for an oracle to release the next available ID. That's quite important for parallel processing. Otherwise, you might as well use an autoincrement BIGINT.

7. WorldMaker ◴[03 Jan 25 19:30 UTC] No.42588706{4}[source]▶

>>42578939 #

Also, depending on your DB engine and DB design and storage needs it might be just fine to store ULID as `char(26)` instead of `uniqueidentifier`. It's a lot more space in bytes, but bytes can be pretty affordable and then the ULIDs are never not in their canonical Base-32 form.

I also worked on applications that used ULIDs in string form only in NoSQL documents, string-based cache keys, and string indexes just fine. I didn't try `char(26)` columns in a DB and seeing how well, for instance, SQL Server clustered and indexed them, but I've SQL Server do just fine with someone's wild idea of clustering a `varhar(MAX)` field and I'm sure SQL Server can probably do it just fine on the technical side.

It's nice that you can easily convert a ULID to a 128-bit key, but you certainly don't have to. (Also, people really do like the ugly dashed hex form of UUIDs sometimes and I've seen people store those directly as strings in places you'd expect they should just store the 128-bit value, it goes both ways here, I suppose.)