←back to thread

283 points rrampage | 2 comments | | HN request time: 0.519s | source
Show context
RA_Fisher ◴[] No.42192651[source]
BM25 is an ancient algo developed in the 1970s. It’s basically a crappy statistical model and statisticians can do far better today. Search is strictly dominated by learning (that yes, can use search as an input). Not many folks realize that yet, and / or are incentivized to keep the old tech going as long as possible, but market pressures will change that.
replies(4): >>42192735 #>>42192805 #>>42192828 #>>42194229 #
simplecto ◴[] No.42192805[source]
Those are some really spicy opinions. It would seem that many search experts might not agree.

David Tippet (formerly opensearch and now at Github)

A great podcast with David Tippet and Nicolay Gerold entitled:

"BM25 is the workhorse of search; vectors are its visionary cousin"

https://www.youtube.com/watch?v=ENFW1uHsrLM

replies(2): >>42192855 #>>42193450 #
RA_Fisher ◴[] No.42193450[source]
I’m sure Search experts would disagree, because it’d be their technology they’d be admitting is inferior to another. BM25 is the workhorse, no doubt— but it’s also not the best anymore. Vectors are a step toward learning models, but only a small mid-range step vs. an explicit model.

Search is a useful approach for computing learning models, but there’s a difference between the computational means and the model. For example, MIPS is a very useful search algo for computing learning models (but first the learning model has to be formulated).

replies(3): >>42193880 #>>42194290 #>>42197352 #
softwaredoug ◴[] No.42194290[source]
I don't know a lot of search practitioners who don't want to use the "new sexy" thing. Most of us do a fair amount of "resume driven development" so can claim to be "AI Engineers" :)
replies(1): >>42195479 #
RA_Fisher ◴[] No.42195479[source]
I don’t think it’s realistic to think that software engineers can pick up advanced statistical modeling on the job, unless they’re pairing with statisticians. There’s just too much background involved.
replies(2): >>42196352 #>>42197148 #
1. softwaredoug ◴[] No.42197148[source]
The "search practitioners" I'm referring to are pretty uniformly ML Engineers . They also work on feeds, recommendations, and adjacent Information Retrieval spaces. Both to generate L0 retrieval candidates and to do higher layers of reranking with learning to rank and other systems to whatever the system's goal is...

You can decide if you agree that most people are sufficiently statistically literate in that group of people. But some humility around statistics is probably far up there in what I personally interview for.

replies(1): >>42197737 #
2. RA_Fisher ◴[] No.42197737[source]
For sure. There are ML folks with statistical learning backgrounds, but it tends to be relatively rare. Physics and CS are more common. They tend to view things like you mention, more procedural eg- learning to rank, minimizing distances, less statistical modeling. Humility around statistics is good, but statistical knowledge is still what's required to really level up these systems (I've built them as well).