Production RAG: what I learned from processing 5M+ documents

1. leetharris ◴[20 Oct 25 17:05 UTC] No.45646303[source]▶

Embedding based RAG will always just be OK at best. It is useful for little parts of a chain or tech demos, but in real life use it will always falter.

replies(6): >>45646470 #>>45646482 #>>45646495 #>>45646758 #>>45646892 #>>45656450 #

2. sgt ◴[20 Oct 25 17:17 UTC] No.45646470[source]▶

>>45646303 (TP) #

What do you recommend? Query generation?

3. esafak ◴[20 Oct 25 17:19 UTC] No.45646482[source]▶

>>45646303 (TP) #

Compared with what?

replies(1): >>45647936 #

4. charcircuit ◴[20 Oct 25 17:19 UTC] No.45646495[source]▶

>>45646303 (TP) #

Most of my ChatGPT queries use RAG (based on the query ChatGPT will decide if it needs to search the web) to get up to date information about the world. In reality life it's effective and it's why every large provider supports it.

5. underlines ◴[20 Oct 25 17:40 UTC] No.45646758[source]▶

>>45646303 (TP) #

rag will be pronounced differently ad again and again. it has its use cases. we moved to agentic search having rag as a tool while other retrieval strategies we added use real time search in the sources. often skipping ingested and chunked soueces. large changes next windows allow for putting almost whole documents into one request.

6. phillipcarter ◴[20 Oct 25 17:50 UTC] No.45646892[source]▶

>>45646303 (TP) #

Not necessarily? It's been the basis of one of the major ways people would query their data since 2023 on a product I worked on: https://www.honeycomb.io/blog/introducing-query-assistant

The difference is this feature explicitly isn't designed to do a whole lot, which is still the best way to build most LLM-based products and sandwich it between non-LLM stuff.

7. leetharris ◴[20 Oct 25 19:13 UTC] No.45647936[source]▶

>>45646482 #

Full text agentic retrieval. Instead of cosine similarity on vectors, parsing metadata through an agentic loop.

To give a real world example, the way Claude Code works versus how Cursor's embedded database works.

replies(1): >>45648797 #

8. lifty ◴[20 Oct 25 20:20 UTC] No.45648797{3}[source]▶

>>45647936 #

How do you do that on 5 million documents?

replies(1): >>45655545 #

9. leetharris ◴[21 Oct 25 13:29 UTC] No.45655545{4}[source]▶

>>45648797 #

People are usually not querying across 5 million documents in a single scope.

If you want something as simple as "suggest similar tweets" or something across millions of things then embeddings still work.

But if you want something like "compare the documents across these three projects" then you would use full text metadata extraction. Keywords, summaries, table of contents, etc to determine data about each document and each chunk.

10. DSingularity ◴[21 Oct 25 14:43 UTC] No.45656450[source]▶

>>45646303 (TP) #

Super useful for grounding which is often the only way to robustly protect against hallucinations.