Most active commenters
  • bogomipz(6)
  • dekhn(5)
  • lucb1e(3)
  • chrisper(3)
  • rockdoe(3)

←back to thread

1764 points fatihky | 29 comments | | HN request time: 1.757s | source | bottom
Show context
buttershakes ◴[] No.12701447[source]
I had almost exactly this same engineering test when google interviewed me in 2006. It was terrible, and left a bad taste in my mouth. Given the complexity of the work I was doing at the time, the entire thing seemed ridiculous.
replies(2): >>12701579 #>>12701983 #
1. TheCapn ◴[] No.12701579[source]
The inode question gave me flashbacks to my interview with Amazon. They wanted me to explain what a hash function is. I kept giving answers for about 3 minutes explaining hashing, common algorithms, reasons to use it and places it applies.

Recruiter: "I was looking for you to say it's a fingerprint"

So I guess I was wrong, because despite explaining them in decent detail, I didn't use the one keyword.

replies(6): >>12701662 #>>12701711 #>>12701878 #>>12701993 #>>12702618 #>>12702696 #
2. dekhn ◴[] No.12701662[source]
only some hash functions are fingerprints. Other hash functions are the exact opposite of a fingerprint.
replies(1): >>12702099 #
3. mgkimsal ◴[] No.12701711[source]
recruiters need to have a few levels of keywords they scan for.

level 1: foo, bar, baz level 2: frobnitz, barfoo level 3: 42, etc

In someone is using words from level 2 that work together in the ways laid out, they're probably beyond level 1, and wouldn't use the word 'fingerprint' (in this case) - they're giving more detail (and probably better) than what was being listened for.

replies(1): >>12701924 #
4. sharemywin ◴[] No.12701878[source]
I've found the best way to answer those is "buzzword diarrhea"

first thing I do is try to get them to ask it a different way. then I play spew out every related buzzword possible as if its a question. Are you talking about .... oh, you mean .... I can usually get it in a few tries.

5. xigency ◴[] No.12701924[source]
This idea sounds like a slightly more brilliant test than this candidate was faced with.
6. dmayle ◴[] No.12701993[source]
I'd have failed too, because I use a very particular way to describe a hashing function.

A hashing function is actually a sorting function. It's supposed to take an input space and sort it in an unpredictable and evenly distributed way across the output space. What's more, neighboring points in the input space, no matter the sort used to determine proximity should not result in neighboring points in the output space.

Fingerprinting is just an emergent value that comes from choosing fixed length hashes, and the fact that the mapping from input to output is stable.

replies(1): >>12703288 #
7. lucb1e ◴[] No.12702099[source]
I know some hash functions are not meant to create unique fingerprints (they're to pick a bucket to put/look in), but what do you mean by the opposite of a fingerprint?
replies(1): >>12702209 #
8. dekhn ◴[] No.12702209{3}[source]
The goal of a fingerprint hash is to convert an input space of "large" values to an output space where the output space values are much shorter- typically fixed size and two similar input values have effectively random outputs (without spending the CPU cycles to implement a cryptographic hash). This permits a wide range of optimizations (a document can be fingerprinted, and looked up by its fingerprint, to see if it's a cache, has associated data, etc).

A non-fingerprinting hash function - well, one example of one- is something that for similar inputs, produces the same output (similarity being defined by some distance metric). See, for example: https://en.wikipedia.org/wiki/Locality-sensitive_hashing and https://en.wikipedia.org/wiki/MinHash

Confusingingly, many functions that are used for similarity detection are called Fingerprints, (https://en.wikipedia.org/wiki/Acoustic_fingerprint), but I consider that a distinct use of the term.

replies(2): >>12703128 #>>12703627 #
9. dvirsky ◴[] No.12702618[source]
On my phone screen a Google recruiter asked me "how much is 2^24", and I knew the answer by heart and answered immediately.

So he asked "how did you figure this out so fast?". I told him I didn't, I just remember all the "important" powers of 2. He said "well... that's not what I was looking for, I wanted you to calculate it, but... I guess a candidate who memorizes powers of 2 is a positive sign?". I passed.

replies(2): >>12703273 #>>12712707 #
10. chrisper ◴[] No.12702696[source]
I got an interview with Amazon, but failed the first round, because I did not do well on the reasoning part of the online test. You couldn't skip questions, so I spent too much time on some of them and had to rush the ones at the end. They failed me despite getting the coding part 100% right.

Oh well.

replies(1): >>12703784 #
11. lucb1e ◴[] No.12703128{4}[source]
Thanks for the elaborate answer! I knew about things like acoustic fingerprints but not that people use the term "hash function" to describe something that indicates similarity.

Could something like Hamming distance be called a hash function too? It's not mentioned on its Wikipedia page.

replies(2): >>12703301 #>>12703504 #
12. rockdoe ◴[] No.12703273[source]
"16 million colors" if you get what I mean.
13. rockdoe ◴[] No.12703288[source]
Why do you use the word "sort" rather than "map" here though.
replies(1): >>12706758 #
14. rockdoe ◴[] No.12703301{5}[source]
No because that works between pairs. It's a comparison method, not a mapping.
replies(1): >>12705123 #
15. dekhn ◴[] No.12703504{5}[source]
it's worth reading and understanding the ideas behind LSH and hamming distance. there are some... fundamental mathematical relationshps there that are still being ... hashed out.
16. bogomipz ◴[] No.12703627{4}[source]
"The goal of a fingerprint hash is to convert an input space of "large" values to an output space where the output space values are much shorter- typically fixed size and two similar input values have effectively random outputs"

But this is also the goal of non-crypto hash function like those used in a data structure no? Basically mapping a large space of inputs to a smaller space of outputs.

I would say the cryptographic hash is one the that has "desirable" security properties, things like not being able to recover the original input message using the hash, a tiny change in the input causes a substantial change in the output, or that its very unlikely that two inputs will produce a collision.

replies(1): >>12703902 #
17. bogomipz ◴[] No.12703784[source]
Can you elaborate on what types of things you were asked to reason about in this test?
replies(1): >>12705882 #
18. dekhn ◴[] No.12703902{5}[source]
both types of hash convert an input space to an output space where (typically) the input space is much larger.

What matters is whether similar inputs get mapped to the same output. For the case where you want to minimize the probability that two inputs which are highly similar land in the same bucket, you want a crypto hash, although those are expensive so you want a cheaper approximation, which is exactly what fingerprint hashes do. The problem is that as the input values counts approach sqrt(output value size), you're going to start getting collisions, and ideally, you want those collisions to be evenly spaced.

In the case of a similarity hash function, you want the opposite, the closer things are in some metric space, the more likely they end in up in the same bucket.

replies(1): >>12704979 #
19. bogomipz ◴[] No.12704979{6}[source]
"For the case where you want to minimize the probability that two inputs which are highly similar land in the same bucket, you want a crypto hash"

That's what I was trying to say:

"a tiny change in the input causes a substantial change in the output."

But I probably didn't articulate that very well. I think we are saying the same thing.

You mentioned:

"The problem is that as the input values counts approach sqrt(output value size), you're going to start getting collisions"

Can you elaborate on the significant a square root here? Does this relate to the load factor?

replies(1): >>12705069 #
20. dekhn ◴[] No.12705069{7}[source]
https://en.wikipedia.org/wiki/Birthday_problem Basically, you get 50% chance of collision around sqrt(size of table).
replies(1): >>12710335 #
21. lucb1e ◴[] No.12705123{6}[source]
Ah I see. Thanks!
22. chrisper ◴[] No.12705882{3}[source]
Sorry, I only saw your comment now. They were asking stuff like finding patterns (like in IQ tests). Then they also had ones where you had to read text and then say what you would do with it based on certain requirements. You couldn't skip questions so I spent way too much time on the number ones.

I cannot obviously tell you the exact specifics because of the NDA.

replies(1): >>12710323 #
23. rocqua ◴[] No.12706758{3}[source]
I'd guess because sort implies an ordering, which, when total, implies a distance.
24. bogomipz ◴[] No.12710323{4}[source]
I am curious what your opinion is of these? I personally find this practice loathsome. Was this something that happened after speaking to a human or was this the initial screening?
replies(1): >>12712512 #
25. bogomipz ◴[] No.12710335{8}[source]
Ah right, a probability theory classic. Thanks, cheers.
26. chrisper ◴[] No.12712512{5}[source]
It was the initial screening and I was very upset that they rejected me since I did (very) well on the coding part but poorly on that reasoning test. If I have a good GPA at a Uni with a degree in Computer Science, then they can certainly assume that I can reason and there is no need to put me through a very strictly timed "IQ test."

Maybe I dodged a bullet there and Amazon would have been a bad place to end up anyway.

replies(1): >>12712705 #
27. bogomipz ◴[] No.12712705{6}[source]
Yeah that would be my thought as well - 'do I really want to work for a company that makes me take a timed IQ test. Probably just as well as you said.
28. oldcreek12 ◴[] No.12712707[source]
haha, I got that question too, in stead I worked in MPLS for over 5 years, I know by heart that MPLS label has 20bits which translates to 1 million labels, times 2^4 that is 16 million. The recruiter asked me how I figured it out so fast, I explained to him, that recruiter had no idea of what I was talking about. I did not pass.
replies(1): >>12713687 #
29. dvirsky ◴[] No.12713687{3}[source]
Well, I answered the rest of the questions perfectly and exactly like he expected, so it might have helped.