Production RAG: what I learned from processing 5M+ documents

(blog.abdellatif.io)

548 points tifa2up | 2 comments | 20 Oct 25 15:55 UTC | HN request time: 0.001s | source

Show context

mediaman ◴[20 Oct 25 17:22 UTC] No.45646532[source]▶

The point about synthetic query generation is good. We found users had very poor queries, so we initially had the LLM generate synthetic queries. But then we found that the results could vary widely based on the specific synthetic query it generated, so we had it create three variants (all in one LLM call, so that you can prompt it to generate a wide variety, instead of getting three very similar ones back), do parallel search, and then use reciprocal rank fusion to combine the list into a set of broadly strong performers. For the searches we use hybrid dense + sparse bm25, since dense doesn't work well for technical words.

This, combined with a subsequent reranker, basically eliminated any of our issues on search.

replies(4): >>45647148 #>>45647160 #>>45647255 #>>45649007 #

siva7 ◴[20 Oct 25 18:20 UTC] No.45647255[source]▶

>>45646532 #

Boy, that should not be the concern of the end user (developer) but those implementing RAG solutions as a service at Amazon, Microsoft, Openai and so on.

replies(1): >>45648705 #

pamelafox ◴[20 Oct 25 20:13 UTC] No.45648705[source]▶

>>45647255 #

At Microsoft, that's all baked into Azure AI Search - hybrid search does BM25, vector search, and re-ranking, just with setting booleans to true. It also has a new Agentic retrieval feature that does the query rewriting and parallel search execution.

Disclosure: I work at MS and help maintain our most popular open-source RAG template, so I follow the best practices closely: https://github.com/Azure-Samples/azure-search-openai-demo/

So few developers realize that you need more than just vector search, so I still spend many of my talks emphasizing the FULL retrieval stack for RAG. It's also possible to do it on top of other DBs like Postgres, but takes more effort.

replies(5): >>45648904 #>>45648985 #>>45649659 #>>45650931 #>>45654119 #

1. osigurdson ◴[20 Oct 25 21:33 UTC] No.45649659{3}[source]▶

>>45648705 #

Are you using Elasticsearch behind the scenes?

replies(1): >>45649840 #

2. pamelafox ◴[20 Oct 25 21:47 UTC] No.45649840[source]▶

>>45649659 (TP) #

I believe that Azure AI Search currently uses lucene for BM25, hnswlib for vector search, and the Bing re-ranking model for semantic ranking. (So, no, it does not, though features are similar)

↑