←back to thread

283 points rrampage | 4 comments | | HN request time: 0.001s | source
Show context
RA_Fisher ◴[] No.42192651[source]
BM25 is an ancient algo developed in the 1970s. It’s basically a crappy statistical model and statisticians can do far better today. Search is strictly dominated by learning (that yes, can use search as an input). Not many folks realize that yet, and / or are incentivized to keep the old tech going as long as possible, but market pressures will change that.
replies(4): >>42192735 #>>42192805 #>>42192828 #>>42194229 #
simplecto ◴[] No.42192805[source]
Those are some really spicy opinions. It would seem that many search experts might not agree.

David Tippet (formerly opensearch and now at Github)

A great podcast with David Tippet and Nicolay Gerold entitled:

"BM25 is the workhorse of search; vectors are its visionary cousin"

https://www.youtube.com/watch?v=ENFW1uHsrLM

replies(2): >>42192855 #>>42193450 #
RA_Fisher ◴[] No.42193450[source]
I’m sure Search experts would disagree, because it’d be their technology they’d be admitting is inferior to another. BM25 is the workhorse, no doubt— but it’s also not the best anymore. Vectors are a step toward learning models, but only a small mid-range step vs. an explicit model.

Search is a useful approach for computing learning models, but there’s a difference between the computational means and the model. For example, MIPS is a very useful search algo for computing learning models (but first the learning model has to be formulated).

replies(3): >>42193880 #>>42194290 #>>42197352 #
1. simplecto ◴[] No.42193880[source]
It seems that the current mode (eg fashion) is a hybrid approach, with vector results on one side, BM25 on the other, and then a re-reank algo to smooth things out.

I'm out of my depth here but genuinely interested and curious to see over the horizon.

replies(2): >>42193942 #>>42196684 #
2. authorfly ◴[] No.42193942[source]
Out of interest how come you use the word "mode" here?
replies(1): >>42194037 #
3. simplecto ◴[] No.42194037[source]
because the space moves fast, and from my learning this is the current thing. Like fashion -- it changes from season to season
4. RA_Fisher ◴[] No.42196684[source]
Best is to use one statistical model and encode the underlying aspects of the context that relate to goal outcomes.