Recruiter: "I was looking for you to say it's a fingerprint"
So I guess I was wrong, because despite explaining them in decent detail, I didn't use the one keyword.
level 1: foo, bar, baz level 2: frobnitz, barfoo level 3: 42, etc
In someone is using words from level 2 that work together in the ways laid out, they're probably beyond level 1, and wouldn't use the word 'fingerprint' (in this case) - they're giving more detail (and probably better) than what was being listened for.
first thing I do is try to get them to ask it a different way. then I play spew out every related buzzword possible as if its a question. Are you talking about .... oh, you mean .... I can usually get it in a few tries.
I had the same a couple of years ago, at the time I was a Rails developer at a small startup. The first screening they asked me some basic technical questions (I guess it was a recruiter), then the second screening they asked me to walk through some data structures and algorithms off the top of my head.
I knew they would do this, and hadn't looked at any CS algorithms since university, so got a copy of Programming Pearls and studied through that every evening for the week before. I picked a few that I hoped would come up, and it turned out that's what they asked for. I think first a linked list (maybe doubly linked?) and then a tree sort.
I surprisingly passed, but the attitude of the interviewer really put me off. He said he knew Python, Java and Go - none of which I had used - and wasn't too happy when I said I wanted to use Ruby (which the first guy I spoke to, said was fine). Then throughout the interview it seemed very much like he was fighting against me and trying to prove me wrong.
After that I couldn't be bothered any more, I didn't really want a job at Google, it's just a recruiter contacted me and I decided to try it out. I guess this style of interviewing must work for Google, but it's just not the way I like a company to introduce itself to me. It just seems like they are approaching it with so much ego, as if I would be privileged to work there, but to me a job should be mutually beneficial.
A hashing function is actually a sorting function. It's supposed to take an input space and sort it in an unpredictable and evenly distributed way across the output space. What's more, neighboring points in the input space, no matter the sort used to determine proximity should not result in neighboring points in the output space.
Fingerprinting is just an emergent value that comes from choosing fixed length hashes, and the fact that the mapping from input to output is stable.
A non-fingerprinting hash function - well, one example of one- is something that for similar inputs, produces the same output (similarity being defined by some distance metric). See, for example: https://en.wikipedia.org/wiki/Locality-sensitive_hashing and https://en.wikipedia.org/wiki/MinHash
Confusingingly, many functions that are used for similarity detection are called Fingerprints, (https://en.wikipedia.org/wiki/Acoustic_fingerprint), but I consider that a distinct use of the term.
So he asked "how did you figure this out so fast?". I told him I didn't, I just remember all the "important" powers of 2. He said "well... that's not what I was looking for, I wanted you to calculate it, but... I guess a candidate who memorizes powers of 2 is a positive sign?". I passed.
Oh well.
Could something like Hamming distance be called a hash function too? It's not mentioned on its Wikipedia page.
But this is also the goal of non-crypto hash function like those used in a data structure no? Basically mapping a large space of inputs to a smaller space of outputs.
I would say the cryptographic hash is one the that has "desirable" security properties, things like not being able to recover the original input message using the hash, a tiny change in the input causes a substantial change in the output, or that its very unlikely that two inputs will produce a collision.
What matters is whether similar inputs get mapped to the same output. For the case where you want to minimize the probability that two inputs which are highly similar land in the same bucket, you want a crypto hash, although those are expensive so you want a cheaper approximation, which is exactly what fingerprint hashes do. The problem is that as the input values counts approach sqrt(output value size), you're going to start getting collisions, and ideally, you want those collisions to be evenly spaced.
In the case of a similarity hash function, you want the opposite, the closer things are in some metric space, the more likely they end in up in the same bucket.
That's what I was trying to say:
"a tiny change in the input causes a substantial change in the output."
But I probably didn't articulate that very well. I think we are saying the same thing.
You mentioned:
"The problem is that as the input values counts approach sqrt(output value size), you're going to start getting collisions"
Can you elaborate on the significant a square root here? Does this relate to the load factor?
I cannot obviously tell you the exact specifics because of the NDA.
Maybe I dodged a bullet there and Amazon would have been a bad place to end up anyway.