Understanding the BM25 full text search algorithm

My opinion is people need to not focus on one stack. But be prepared to use tools best for each job. Elasticsearch for BM25 type things. Turbopuffer for simple and fast vector retrieval. Even redis to precompute results for certain queries. Or certain extremely dynamic attributes that change frequently like price. Combine all these in a scatter/gather approach.

I say that because almost always you have a layer outside the search stack(s) that ideally can just be a straightforward inference service for reranking that looks most like other ML infra.

You also almost always route queries to different backends based on an understanding of the users query. Routing “lookup by ID” to a different system than “fuzzy semantic search”. These are very different data structures. And search almost always covers very broad/different use cases.

I think it’s an anti pattern to just push all work to one system. Each system is ideal for different workloads. And their inference capabilities won’t ever keep pace with the general ML tooling that your ML engineers are used to. (I tried with Elasticsearch Learning to Rank and its a hopeless task.)

(That said, Vespa is probably the best 'single stack' that tries to solve a broad range of use-cases.)