←back to thread

283 points rrampage | 1 comments | | HN request time: 0.041s | source
Show context
RA_Fisher ◴[] No.42192651[source]
BM25 is an ancient algo developed in the 1970s. It’s basically a crappy statistical model and statisticians can do far better today. Search is strictly dominated by learning (that yes, can use search as an input). Not many folks realize that yet, and / or are incentivized to keep the old tech going as long as possible, but market pressures will change that.
replies(4): >>42192735 #>>42192805 #>>42192828 #>>42194229 #
simplecto ◴[] No.42192805[source]
Those are some really spicy opinions. It would seem that many search experts might not agree.

David Tippet (formerly opensearch and now at Github)

A great podcast with David Tippet and Nicolay Gerold entitled:

"BM25 is the workhorse of search; vectors are its visionary cousin"

https://www.youtube.com/watch?v=ENFW1uHsrLM

replies(2): >>42192855 #>>42193450 #
dumb1224 ◴[] No.42192855[source]
Agreed. In the 2000s it was all about BM25 in the NLP community. I hardly see any paper that did not mention it in my opinion.
replies(2): >>42193496 #>>42193948 #
1. authorfly ◴[] No.42193948[source]
And dependency chaining. But yes, lots of BM25.

The 2000s and even 2010s was a wonderful and fairly theoretical time for linguistics and NLP. A time when NLP seemed to harbor real anonymized general information to make the right decisions with, without impinging on privacy.

Oh to go back.