ChibiHash: Small, Fast 64 bit hash function

(nrk.neocities.org)

104 points thunderbong | 5 comments | 18 Nov 24 00:23 UTC | HN request time: 0.918s | source

Show context

aappleby ◴[18 Nov 24 04:58 UTC] No.42169830[source]▶

SMHasher/Murmurhash author here - I don't see anything fundamentally wrong with this hash function, it uses basically the same operations as the Murmur family (and a lot of other hashes at this point).

The handling of the "tail" of the key (the last < 32 bytes) is slightly odd (the "if (l & 1) { mix 1 byte of key }" happens before 8-byte chunks and 2-byte chunks), but if it passes SMHasher it should be fine for general use.

replies(1): >>42171063 #

HexDecOctBin ◴[18 Nov 24 09:49 UTC] No.42171063[source]▶

>>42169830 #

> if it passes SMHasher it should be fine for general use.

Author of the post writes:

> I kept making changes until the tests passed.

If SMHasher has become a target, shouldn't it be considered a bad measure now? [1]

[1]: https://en.wikipedia.org/wiki/Goodhart%27s_law

replies(2): >>42171485 #>>42186477 #

1. gliptic ◴[18 Nov 24 11:27 UTC] No.42171485[source]▶

>>42171063 #

Hash tests have always been targets. What else are you supposed to do for non-cryptographic hashes?

replies(1): >>42171644 #

2. vanderZwan ◴[18 Nov 24 11:57 UTC] No.42171644[source]▶

>>42171485 (TP) #

Not OP, but I suppose there might be a difference between blindly trying to pass a test by tweaking numbers, or a more principled approach to figuring out how to introduce better randomness.

OTOH, I would expect that tests for hashing and random number generation in particular would be naturally resistant to overfitting, due to the nature of the problem. But I'm not an expert at this topic, so I'd love to hear someone else's take on that

replies(1): >>42171816 #

3. gliptic ◴[18 Nov 24 12:32 UTC] No.42171816[source]▶

>>42171644 #

Maybe, but from what I've seen you run out of what theory can say about your non-crypto hash function pretty quickly.

replies(2): >>42172043 #>>42181643 #

4. HelloNurse ◴[18 Nov 24 13:16 UTC] No.42172043{3}[source]▶

>>42171816 #

If this is the case, improvements come from tuning constants and calculation and tests are more valuable.

5. vanderZwan ◴[19 Nov 24 09:46 UTC] No.42181643{3}[source]▶

>>42171816 #

I suppose that makes sense: the more you can say about a source of randomness, the less random it probably is, which makes the problem almost self-defeating

↑