The man who killed Google Search?

(www.wheresyoured.at)

1884 points elorant | 1 comments | 23 Apr 24 16:43 UTC | HN request time: 0s | source

Show context

gregw134 ◴[23 Apr 24 20:15 UTC] No.40136741[source]▶

Ex-Google search engineer here (2019-2023). I know a lot of the veteran engineers were upset when Ben Gomes got shunted off. Probably the bigger change, from what I've heard, was losing Amit Singhal who led Search until 2016. Amit fought against creeping complexity. There is a semi-famous internal document he wrote where he argued against the other search leads that Google should use less machine-learning, or at least contain it as much as possible, so that ranking stays debuggable and understandable by human search engineers. My impression is that since he left complexity exploded, with every team launching as many deep learning projects as they can (just like every other large tech company has).

The problem though, is the older systems had obvious problems, while the newer systems have hidden bugs and conceptual issues which often don't show up in the metrics, and which compound over time as more complexity is layered on. For example: I found an off by 1 error deep in a formula from an old launch that has been reordering top results for 15% of queries since 2015. I handed it off when I left but have no idea whether anyone actually fixed it or not.

I wrote up all of the search bugs I was aware of in an internal document called "second page navboost", so if anyone working on search at Google reads this and needs a launch go check it out.

replies(11): >>40136833 #>>40136879 #>>40137570 #>>40137898 #>>40137957 #>>40138051 #>>40140388 #>>40140614 #>>40141596 #>>40146159 #>>40166064 #

mrkeen ◴[24 Apr 24 07:20 UTC] No.40141596[source]▶

>>40136741 #

> There is a semi-famous internal document he wrote where he argued against the other search leads that Google should use less machine-learning, or at least contain it as much as possible, so that ranking stays debuggable and understandable by human search engineers.

There's a lot of ML hate here, and I simply don't see the alternative.

To rank documents, you need to score them. Google uses hundreds of scoring factors (I've seen the number 200 thrown about, but it doesn't really matter if it's 5 or 1000.) The point is you need to sum these weights up into a single number to find out if a result should be above or below another result.

So, if:

  - document A is 2Kb long, has 14 misspellings, matches 2 of your keywords exactly, matches a synonym of another of your keywords, and was published 18 months ago, and

  - document B is 3Kb long, has 7 misspellings, matches 1 of your keywords exactly, matches two more keywords by synonym, and was published 5 months ago

Are there any humans out there who want to write a traditional forward-algorithm to tell me which result is better?

replies(4): >>40141644 #>>40141688 #>>40144593 #>>40165827 #

datadeft ◴[24 Apr 24 07:34 UTC] No.40141688[source]▶

>>40141596 #

You do not need to. Counting how many links are pointing to each document is sufficient if you know how long that link existed (spammers link creation time distribution is widely differnt to natural link creation times, and many other details that you can use to filter out spammers)

replies(2): >>40141733 #>>40142033 #

1. raincole ◴[24 Apr 24 07:41 UTC] No.40141733[source]▶

>>40141688 #

> spammers link creation time distribution is widely differnt to natural link creation times

Yes, this is a statistical method. Guess what machine learning is and what it actually excels?

↑