←back to thread

283 points rrampage | 2 comments | | HN request time: 0s | source
Show context
hubraumhugo ◴[] No.42193073[source]
Given the recent advances in vector-based semantic search, what's the SOTA search stack that people are using for hybrid keyword + semantic search these days?
replies(7): >>42193208 #>>42193787 #>>42193816 #>>42193909 #>>42193922 #>>42193932 #>>42194089 #
emschwartz ◴[] No.42193208[source]
Most of the commercial and open source offerings for hybrid search seem to be using BM25 + vector similarity search based on embeddings. The results are combined using Reciprocal Rank Fusion (RRF).

The RRF paper is impressive in how incredibly simple it is (the paper is only 2 pages): https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf

replies(2): >>42193625 #>>42195796 #
1. softwaredoug ◴[] No.42195796[source]
A warning that RRF is often not Enough, as it can just drag a good solution down towards the worse solution :)

https://softwaredoug.com/blog/2024/11/03/rrf-is-not-enough

replies(1): >>42196484 #
2. emschwartz ◴[] No.42196484[source]
Ah, that's great! Thanks for sharing that.

I had actually implemented full text search + vector search using RRF but I kept it disabled by default because it wasn't meaningfully improving my results. This seems like a good hypothesis as to why.