Embedding based RAG will always just be OK at best. It is useful for little parts of a chain or tech demos, but in real life use it will always falter.
replies(6):
The difference is this feature explicitly isn't designed to do a whole lot, which is still the best way to build most LLM-based products and sandwich it between non-LLM stuff.
To give a real world example, the way Claude Code works versus how Cursor's embedded database works.
If you want something as simple as "suggest similar tweets" or something across millions of things then embeddings still work.
But if you want something like "compare the documents across these three projects" then you would use full text metadata extraction. Keywords, summaries, table of contents, etc to determine data about each document and each chunk.