Production RAG: what I learned from processing 5M+ documents

(blog.abdellatif.io)

548 points tifa2up | 1 comments | 20 Oct 25 15:55 UTC | HN request time: 0s | source

Show context

mediaman ◴[20 Oct 25 17:22 UTC] No.45646532[source]▶

The point about synthetic query generation is good. We found users had very poor queries, so we initially had the LLM generate synthetic queries. But then we found that the results could vary widely based on the specific synthetic query it generated, so we had it create three variants (all in one LLM call, so that you can prompt it to generate a wide variety, instead of getting three very similar ones back), do parallel search, and then use reciprocal rank fusion to combine the list into a set of broadly strong performers. For the searches we use hybrid dense + sparse bm25, since dense doesn't work well for technical words.

This, combined with a subsequent reranker, basically eliminated any of our issues on search.

replies(4): >>45647148 #>>45647160 #>>45647255 #>>45649007 #

alansaber ◴[20 Oct 25 20:38 UTC] No.45649007[source]▶

>>45646532 #

Yep- that's all best practice. I want to know if we could push performance further- routing the query to different embedding models or scoring strategies, or using multiple re-rankers- still feels like the process is missing something.

replies(1): >>45653469 #

1. tifa2up ◴[21 Oct 25 07:49 UTC] No.45653469[source]▶

>>45649007 #

OP. The way you improve it is move away from single shot semantic/keyword search and have an agentic system that can evaluate results and do follow-up queries.

↑