←back to thread

The man who killed Google Search?

(www.wheresyoured.at)
1884 points elorant | 1 comments | | HN request time: 0s | source
Show context
gregw134 ◴[] No.40136741[source]
Ex-Google search engineer here (2019-2023). I know a lot of the veteran engineers were upset when Ben Gomes got shunted off. Probably the bigger change, from what I've heard, was losing Amit Singhal who led Search until 2016. Amit fought against creeping complexity. There is a semi-famous internal document he wrote where he argued against the other search leads that Google should use less machine-learning, or at least contain it as much as possible, so that ranking stays debuggable and understandable by human search engineers. My impression is that since he left complexity exploded, with every team launching as many deep learning projects as they can (just like every other large tech company has).

The problem though, is the older systems had obvious problems, while the newer systems have hidden bugs and conceptual issues which often don't show up in the metrics, and which compound over time as more complexity is layered on. For example: I found an off by 1 error deep in a formula from an old launch that has been reordering top results for 15% of queries since 2015. I handed it off when I left but have no idea whether anyone actually fixed it or not.

I wrote up all of the search bugs I was aware of in an internal document called "second page navboost", so if anyone working on search at Google reads this and needs a launch go check it out.

replies(11): >>40136833 #>>40136879 #>>40137570 #>>40137898 #>>40137957 #>>40138051 #>>40140388 #>>40140614 #>>40141596 #>>40146159 #>>40166064 #
JohnFen ◴[] No.40136833[source]
> where he argued against the other search leads that Google should use less machine-learning

This better echoes my personal experience with the decline of Google search than TFA: it seems to be connected to the increasing use of ML in that the more of it Google put in, the worse the results I got were.

replies(3): >>40137620 #>>40137737 #>>40137885 #
potatolicious ◴[] No.40137620[source]
It's also a good lesson for the new AI cycle we're in now. Often inserting ML subsystems into your broader system just makes it go from "deterministically but fixably bad" to "mysteriously and unfixably bad".
replies(5): >>40137968 #>>40138119 #>>40138995 #>>40139020 #>>40147693 #
munk-a ◴[] No.40138119[source]
I think - I hope, rather - that technically minded people who are advocating for the use of ML understand the short comings and hallucinations... but we need to be frank about the fact that the business layer above us (with a few rare exceptions) absolutely does not understand the limitations of AI and views it as a magic box where they type in "Write me a story about a bunny" and get twelve paragraphs of text out. As someone working in a healthcare adjacent field I've seen the glint in executive's eyes when talking about AI and it can provide real benefits in data summarization and annotation assistance... but there are limits to what you should trust it with and if it's something big-i Important then you'll always want to have a human vetting step.
replies(4): >>40138577 #>>40138723 #>>40138897 #>>40139084 #
1. godelski ◴[] No.40139084[source]
> but we need to be frank about the fact that the business layer above us (with a few rare exceptions) absolutely does not understand the limitations of AI and views it as a magic box where they type in

I think we also need to be aware that this business layer above us that often sees __computers__ as a magic box where they type in. There's definitely a large spectrum of how magical this seems to that layer, but the issue remains that there are subtleties that are often important but difficult to explain without detailed technical knowledge. I think there's a lot of good ML can do (being a ML researcher myself), but I often find it ham-fisted into projects simply to say that the project has ML. I think the clearest flag to any engineer that this layer above them has limited domain knowledge is by looking at how much importance they place on KPIs/metrics. Are they targets or are they guides? Because I can assure you, all metrics are flawed -- but some metrics are less flawed than others (and benchmark hacking is unfortunately the norm in ML research[0]).

[0] There's just too much happening so fast and too many papers to reasonably review in a timely manner. It's a competitive environment, where gatekeepers are competitors, and where everyone is absolutely crunched for time and pressured to feel like they need to move even faster. You bet reviews get lazy. The problems aren't "posting preprints on twitter" or "LLMs giving summaries", it's that the traditional peer review system (especially in conference settings) poorly scales and is significantly affected by hype. Unfortunately I think this ends up railroading us in research directions and makes it significantly challenging for graduate students to publish without being connected to big labs (aka, requiring big compute) (tuning is another common way to escape compute constraints, but that falls under "railroading"). There's still some pretty big and fundamental questions that need to be chipped away at but are difficult to publish given the environment. /rant