Recruiter: "I was looking for you to say it's a fingerprint"
So I guess I was wrong, because despite explaining them in decent detail, I didn't use the one keyword.
A non-fingerprinting hash function - well, one example of one- is something that for similar inputs, produces the same output (similarity being defined by some distance metric). See, for example: https://en.wikipedia.org/wiki/Locality-sensitive_hashing and https://en.wikipedia.org/wiki/MinHash
Confusingingly, many functions that are used for similarity detection are called Fingerprints, (https://en.wikipedia.org/wiki/Acoustic_fingerprint), but I consider that a distinct use of the term.
But this is also the goal of non-crypto hash function like those used in a data structure no? Basically mapping a large space of inputs to a smaller space of outputs.
I would say the cryptographic hash is one the that has "desirable" security properties, things like not being able to recover the original input message using the hash, a tiny change in the input causes a substantial change in the output, or that its very unlikely that two inputs will produce a collision.
What matters is whether similar inputs get mapped to the same output. For the case where you want to minimize the probability that two inputs which are highly similar land in the same bucket, you want a crypto hash, although those are expensive so you want a cheaper approximation, which is exactly what fingerprint hashes do. The problem is that as the input values counts approach sqrt(output value size), you're going to start getting collisions, and ideally, you want those collisions to be evenly spaced.
In the case of a similarity hash function, you want the opposite, the closer things are in some metric space, the more likely they end in up in the same bucket.
That's what I was trying to say:
"a tiny change in the input causes a substantial change in the output."
But I probably didn't articulate that very well. I think we are saying the same thing.
You mentioned:
"The problem is that as the input values counts approach sqrt(output value size), you're going to start getting collisions"
Can you elaborate on the significant a square root here? Does this relate to the load factor?