←back to thread

68 points der_gopher | 1 comments | | HN request time: 0s | source
Show context
sedatk ◴[] No.46211578[source]
Whenever ULID comes up, I need to remind that it has a sequential ID generation mode in its spec which is prone to conflicts on multi-threads, processes or hosts which kills the purpose of a "universal" identifier. If you need a sequential ID, just use an integer, preferably one that's autoincremented by the database.

It's best to stick to UUIDv7 because of such quirks of ULID.

replies(6): >>46212131 #>>46212144 #>>46212658 #>>46212724 #>>46213359 #>>46213609 #
listenallyall ◴[] No.46212144[source]
Under what circumstances is it prone to conflicts? On separate threads/hosts/processes, id's created within the same millisecond would be differentiated by the 80 bits of randomness (more than UUID v7).
replies(2): >>46212224 #>>46212294 #
jasonwatkinspdx ◴[] No.46212294[source]
No, ULID has a "monotonic" feature, where if it detects the same millisecond timestamp in back to back calls, it just increments the 80 bit "random" portion. This means it has convoying behavior. If two machines are generating ids independently and happen to choose initial random positions near each other, the probability of collision is much higher than the basic birthday bound.

I think this "sort of monotonic but not really" is the worst of both to be honest. It tempts you to treat it like an invariant when it isn't.

If you want monotonicity with independent generation, just use a composite key that's a lamport clock and a random nonce. Or if you want to be even more snazzy use Hybrid Logical Clocks or similar.

replies(2): >>46212754 #>>46212879 #
Dylan16807 ◴[] No.46212754[source]
> If two machines are generating ids independently and happen to choose initial random positions near each other, the probability of collision is much higher than the basic birthday bound.

But the chance of the initial random positions being near each other is very very low.

If you pick a billion random numbers in an 80 bit space, the chance you have a collision is one in a million. (2^80 / (2^30)^2)

If you pick a thousand random starting points and generate a million sequential numbers each, the chance your starting points are sufficiently close to each other to cause an overlap is one in a trillion. ((2^80 / 2^20) / (2^10)^2)

In that one in a trillion case, you'll likely end up with half a million collisions, which might matter to you. But if you care about 0 collisions versus 1+ collisions, pick the monotonic version.

replies(1): >>46212866 #
jasonwatkinspdx ◴[] No.46212866[source]
Right, but the point is there's no reason to accept this limitation. Likewise why hardcode millisecond scale timestamps in a world where billions of inserts per second are practical on a single server?

Or if what you want is monotonic distributed timestamps, again, HLC is how you do that properly.

So you're embracing this weird limitations for no real benefit.

And as you can see in the rest of this comment thread, a lot of people simply do not even know this behavior and are assuming the 80 bit portion is always random. Which is my whole point about having a not really an invariant invariant just being a bad way to go fundamentally.

Edit: just to reply to the below since I can't do so directly, I understand the arithmetic here. What I'm saying is there's zero reason to choose this weird scheme vs something that's just the full birthday bound and you never think about it again.

As another comment points out: just consider neighbor guessing attacks. This 80 bit random but not random field is a foot gun to anyone that just assumes they can treat it as truly random.

replies(2): >>46212922 #>>46213284 #
1. listenallyall ◴[] No.46212922{5}[source]
It's not a "limitation", he's saying there is much, much, much less chance of having any collisions with ULIDs - one in a million vs one in a trillion