←back to thread

104 points thunderbong | 7 comments | | HN request time: 0s | source | bottom
Show context
aappleby ◴[] No.42169830[source]
SMHasher/Murmurhash author here - I don't see anything fundamentally wrong with this hash function, it uses basically the same operations as the Murmur family (and a lot of other hashes at this point).

The handling of the "tail" of the key (the last < 32 bytes) is slightly odd (the "if (l & 1) { mix 1 byte of key }" happens before 8-byte chunks and 2-byte chunks), but if it passes SMHasher it should be fine for general use.

replies(1): >>42171063 #
1. HexDecOctBin ◴[] No.42171063[source]
> if it passes SMHasher it should be fine for general use.

Author of the post writes:

> I kept making changes until the tests passed.

If SMHasher has become a target, shouldn't it be considered a bad measure now? [1]

[1]: https://en.wikipedia.org/wiki/Goodhart%27s_law

replies(2): >>42171485 #>>42186477 #
2. gliptic ◴[] No.42171485[source]
Hash tests have always been targets. What else are you supposed to do for non-cryptographic hashes?
replies(1): >>42171644 #
3. vanderZwan ◴[] No.42171644[source]
Not OP, but I suppose there might be a difference between blindly trying to pass a test by tweaking numbers, or a more principled approach to figuring out how to introduce better randomness.

OTOH, I would expect that tests for hashing and random number generation in particular would be naturally resistant to overfitting, due to the nature of the problem. But I'm not an expert at this topic, so I'd love to hear someone else's take on that

replies(1): >>42171816 #
4. gliptic ◴[] No.42171816{3}[source]
Maybe, but from what I've seen you run out of what theory can say about your non-crypto hash function pretty quickly.
replies(2): >>42172043 #>>42181643 #
5. HelloNurse ◴[] No.42172043{4}[source]
If this is the case, improvements come from tuning constants and calculation and tests are more valuable.
6. vanderZwan ◴[] No.42181643{4}[source]
I suppose that makes sense: the more you can say about a source of randomness, the less random it probably is, which makes the problem almost self-defeating
7. aappleby ◴[] No.42186477[source]
Hi, don't know if you'll see this but the tests in SMHasher are designed to break hash functions by generating low-entropy key sets that try to trigger weak bit-mixing paths in the hash function.

When a hash function has weak mixing they tend to fail catastrophically on one of those key sets, so you can think of SMHasher as more like "attempts to test the hash function to the point of failure" instead of a "target" as described in Goodhart's Law.