←back to thread

283 points rrampage | 1 comments | | HN request time: 0.21s | source
Show context
hubraumhugo ◴[] No.42193073[source]
Given the recent advances in vector-based semantic search, what's the SOTA search stack that people are using for hybrid keyword + semantic search these days?
replies(7): >>42193208 #>>42193787 #>>42193816 #>>42193909 #>>42193922 #>>42193932 #>>42194089 #
1. dmezzetti ◴[] No.42193932[source]
Excellent article on BM25!

Author of txtai [1] here. txtai implements a performant BM25 index in Python [2] via the arrays package and storing the term frequency vectors in SQLite.

With txtai, the hybrid index approach [3] supports both convex combination when BM25 scores are normalized and reciprocal rank fusion (RRF) when they aren't [4].

[1] https://github.com/neuml/txtai

[2] https://neuml.hashnode.dev/building-an-efficient-sparse-keyw...

[3] https://neuml.hashnode.dev/benefits-of-hybrid-search

[4] https://github.com/neuml/txtai/blob/master/src/python/txtai/...