Obviously LLMs are good at some semantic understanding of the prompt context and are useful, but the irony is hilarious
Building a good RAG pipeline these days takes a lot of manual optimizations. Most engineers intuitively start from naive RAG: throw everything in a vector database and hope that semantic search is powerful enough. This can work for use cases where accuracy isn’t too important and hallucinations are tolerable, but it doesn’t work for more difficult queries that involve multi-hop reasoning or more advanced domain understanding. Also, it’s impossible to debug it.
To address these limitations, many engineers find themselves adding extra layers like agent-based preprocessing, custom embeddings, reranking mechanisms, and hybrid search strategies. Much like the early days of machine learning when we manually crafted feature vectors to squeeze out marginal gains, building an effective RAG system often becomes an exercise in crafting engineering “hacks.”
Earlier this year, Microsoft seeded the idea of using Knowledge Graphs for RAG and published GraphRAG - i.e. RAG with Knowledge Graphs. We believe that there is an incredible potential in this idea, but existing implementations are naive in the way they create and explore the graph. That’s why we developed Fast GraphRAG with a new algorithmic approach using good old PageRank.
There are two main challenges when building a reliable RAG system:
(1) Data Noise: Real-world data is often messy. Customer support tickets, chat logs, and other conversational data can include a lot of irrelevant information. If you push noisy data into a vector database, you’re likely to get noisy results.
(2) Domain Specialization: For complex use cases, a RAG system must understand the domain-specific context. This requires creating representations that capture not just the words but the deeper relationships and structures within the data.
Our solution builds on these insights by incorporating knowledge graphs into the RAG pipeline. Knowledge graphs store entities and their relationships, and can help structure data in a way that enables more accurate and context-aware information retrieval. 12 years ago Google announced the knowledge graph we all know about [1]. It was a pioneering move. Now we have LLMs, meaning that people can finally do RAG on their own data with tools that can be as powerful as Google’s original idea.
Before we built this, Antonio was at Amazon, while Luca and Yuhang were finishing their PhDs at Oxford. We had been thinking about this problem for years and we always loved the parallel between pagerank and the human memory [2]. We believe that searching for memories is incredibly similar to searching the web.
Here’s how it works:
- Entity and Relationship Extraction: Fast GraphRAG uses LLMs to extract entities and their relationships from your data and stores them in a graph format [3].
- Query Processing: When you make a query, Fast GraphRAG starts by finding the most relevant entities using vector search, then runs a personalized PageRank algorithm to determine the most important “memories” or pieces of information related to the query [4].
- Incremental Updates: Unlike other graph-based RAG systems, Fast GraphRAG natively supports incremental data insertions. This means you can continuously add new data without reprocessing the entire graph.
- Faster: These design choices make our algorithm faster and more affordable to run than other graph-based RAG systems because we eliminate the need for communities and clustering.
Suppose you’re analyzing a book and want to focus on character interactions, locations, and significant events:
from fast_graphrag import GraphRAG
DOMAIN = "Analyze this story and identify the characters. Focus on how they interact with each other, the locations they explore, and their relationships."
EXAMPLE_QUERIES = [
"What is the significance of Christmas Eve in A Christmas Carol?",
"How does the setting of Victorian London contribute to the story's themes?",
"Describe the chain of events that leads to Scrooge's transformation.",
"How does Dickens use the different spirits (Past, Present, and Future) to guide Scrooge?",
"Why does Dickens choose to divide the story into \"staves\" rather than chapters?"
]
ENTITY_TYPES = ["Character", "Animal", "Place", "Object", "Activity", "Event"]
grag = GraphRAG(
working_dir="./book_example",
domain=DOMAIN,
example_queries="\n".join(EXAMPLE_QUERIES),
entity_types=ENTITY_TYPES
)
with open("./book.txt") as f:
grag.insert(f.read())
print(grag.query("Who is Scrooge?").response)
This code creates a domain-specific knowledge graph based on your data, example queries, and specified entity types. Then you can query it in plain English while it automatically handles all the data fetching, entity extractions, co-reference resolutions, memory elections, etc. When you add new data, locking and checkpointing is handled for you as well.This is the kind of infrastructure that GenAI apps need to handle large-scale real-world data. Our goal is to give you this infrastructure so that you can focus on what’s important: building great apps for your users without having to care about manually engineering a retrieval pipeline. In the managed service, we also have a suite of UI tools for you to explore and debug your knowledge graph.
We have a free hosted solution with up to 100 monthly requests. When you’re ready to grow, we have paid plans that scale with you. And of course you can self host our open-source engine.
Give us a spin today at https://circlemind.co and see our code at https://github.com/circlemind-ai/fast-graphrag
We’d love feedback :)
[1] https://blog.google/products/search/introducing-knowledge-gr...
[2] Griffiths, T. L., Steyvers, M., & Firl, A. (2007). Google and the Mind: Predicting Fluency with PageRank. Psychological Science, 18(12), 1069–1076. http://www.jstor.org/stable/40064705
[3] Similarly to Microsoft’s GraphRAG: https://github.com/microsoft/graphrag
[4] Similarly to OSU’s HippoRAG: https://github.com/OSU-NLP-Group/HippoRAG
Navigating the graph, on the other hand, is the perfect task for PageRank.
You can check out our example at https://github.com/circlemind-ai/fast-graphrag/blob/main/exa...
Have you tried the sciphi triplex model for extraction? I’ve tried to do some extraction before, but got inconsistent results if I extracted the chunks multiple times consecutively.
But in general we found the best course of action is simply label everything. Because our customers will want those answers and rag won’t really work at the scale of “all podcasts the last 6 months. What is the trend of sentiment Hillary Clinton and what about the top topics and entities mentioned nearby”. So we take a more “brute force” approach :-)
( Like whole thing in contenxt window for instance? )
Is this approach just for cost savings or does it help get better answers and how so?
Could you share a specific example?
(1) FastGraphRAG allows the user to make the graph construction opinionated and specialized on a given domain and for a given use-case; this allows to clear out all the noise in the data and yields better results; (2) Unlike HippoRAG, FastGraphRAG initializes PageRank with a mixture of semantic retrieval and entity extractions; (3) HippoRAG is the outcome of an academic paper, and we saw the need for a more robust and production-ready implementation. Our repo is fully typed, includes tests, handles retries with Instructor, uses structured outputs, and so on.
Moving forward, we see our implementation diverge from HippoRAG more radically as we start to introduce new mechanisms such as weighted edges and negative PageRank to model repulsors.
Our Use case: We have been looking at farming out this work (analyzing complaince documents (manufacturing paperwork) for our AI Startup however we need to understand the potential scale this can operate under and the cost model for it to be useful to us
We will have about 300K PDF documents per client and expect about a 10% change in that document set, month to month -any GraphRag system has to handle documents at scale - we can use S3 as an igestion mechanism but have to understand the cost and processing time needed for the system to be ready to use duiring:
1. inital loading 2. regular updates -how do we delete data from system for example
cool framework btw..
So we should really compare this to other RAG approaches. If we compare it to vector databases RAG, knowledge graphs have the advantage that they model the connections between datapoints. This is super important when asking questions that requires to reason across multiple pieces of information, i.e. multi-hop reasoning.
Also, the graph construction is essentially an exercise in cleaning data to extract the knowledge. Let me give you a practical example. Let's pretend we're indexing customer tickets for creating an AI assistant. If we were to store the data on the tickets as it is, we would overwhelm the vector database with all the noise coming from the conversational nature of this data. With knowledge graphs, we extract only the relevant entities and relationships and store the distilled knowledge in our graph. At query time, we find the answer over a structured data model that contains only clean information
Or how it is close to large context quality of answer with lower cost on some specific examples.
It's helpful when a readme contains a demonstration or as I said above, a specific example.
I guess I’m getting old
The whole approach to representing the work, including the writing here, screams marketing, and the paid offering is the only thing made absolutely clear about it.
p.s. I absolutely understand why a knowledge graph is essential and THE right approach for RAG, and particularly when vector DBS on their own are subpar. But so do know many others and from the way the repo is presented it absolutely gives no clue why yours is _something_ in respect to other/common-sense graph-RAG-somethings.
You see, there are hundreds of smart people out there who can easily come to conclusion data needs to be presented as knowledge in graph-ontological way and then feed the context with only the relevant subgraph. Like, you could’ve said so much rather than asking .0084 cents or whatever for APIs as the headline of a presumably open repo.
FastGraphRAG is entirely free to use, even for commercial applications, and we’re happy to make it accessible to everyone. The managed service is how we sustain our business.
At a high-level:
(1) Domain: allows you to "talk to the graph constructor". If you care particularly about one aspect of your data, this is the place to say it. For reference, take a look at some of the example prompts on our website (https://circlemind.co/)
(2) Example Queries: if you know what class of questions users will ask, it'd be useful to give the system this information so that it will "keep these questions in mind" when designing the graph. If you don't know which kinds of questions, you can just put a couple of high-level questions that you think apply to your data.
(3) Entity Types: this has a very high impact on the final quality of the graph. Think of these as the types of entities that you want to extract from your data, e.g. person, place, event, etc
All of the above help construct the knowledge graph so that it is specifically designed for your use-case.
“You may not: use Output to develop models that compete with OpenAI” => they’re gonna learn from you and you can’t learn from them.
Glad we’re all so cool with longterm economic downfall of natural humans. Our grandkids might not be so glad about it!
(1) Self-hosting our open-source package (2) Using the free tier of the managed service, which includes 100 requests
If you wish to upgrade your plan, you can reach out to us at support [at] circlemind.co
Aider has been doing PageRank on the call graph of code repos since forever. All non trivial code has lots of graph structure to support PageRank. So it works really well to find the most relevant context in the project related to the currently active task.
Few learnings I've collected:
1. Lexical search with BM25 alone gives you very relevant results if you can do some work during ingestion time with an LLM.
2. Embeddings work well only when the size of the query is roughly on the same order of what you're actually storing in the embedding store.
3. Hypothetical answer generation from a query using an LLM, and then using that hypothetical answer to query for embeddings works really well.
So combining all 3 learnings, we landed on a knowledge decomposition and extraction step very similar to yours. But we stick a metaprompter to essentially auto-generate the domain / entity types.
LLMs are naively bad at identifying the correct level of granularity for the decomposed knowledge. One trick we found is to ask the LLM to output a mermaid.js mindmap to hierarchically break down the input into a tree. At the end of that output, ask the LLM to state which level is the appropriate root for a knowledge node.
Then the node is used to generate questions that could be answered from the knowledge contained in this node. We then index the text of these questions and also embed them.
You can directly match the user's query from these questions using purely BM25 and get good outputs. But a hybrid approach works even better, though not by that much.
Not using LLMs are query time also means we can hierarchically walk down the root into deeper and deeper nodes, using the embedding similiarity as a cost function for the traversal.
https://arxiv.org/abs/2105.00110
The paper shows high efficiency compared to other centralities like PageRank, however in some research using the GraphBLAS I and my coauthors found that TC was slower on a variety of sparse graphs than our sparse formulation of PR for graphs up to 1.8 billion edges, but that TC appears to scale better as graphs get larger and is likely more efficient in the trillion edge realm.
https://fossies.org/linux/SuiteSparse/GraphBLAS/Doc/The_Grap...
Hope this can help!
This is honestly wear I think LLM really shines. This also gives you a very good idea if your documentation is deficient or not.
I've been wondering about that and am glad to hear it's working in the wild.
I'm now wondering if using a fine-tuned LLM (on the corpus) to gen the hypothetical answers and then use those for the rag flow would work even better.
Can you expand on what the LLM work here is and it’s purpose?
> 3. Hypothetical answer generation from a query using an LLM, and then using that hypothetical answer to query for embeddings works really well.
Interesting idea, going to add to our experiments. Thanks.
And unless tuned very well, vector search is not actually a whole lot better than a good old well tuned query. Putting everything together, the practice of turning structured data into unstructured data just so you can do vector search or prompt engineering on it, which I've seen teams do, feels a bit backwards. It kind of works but there are probably smarter ways to get the same results. Graph RAG is essentially about making use of structure of data. Whether that's through SQL joins or by querying some graph database doesn't really matter much.
There is probably some value into teaching LLMs how to query as well; or letting them interface with existing search/query APIs. And you can compensate for poor ranking with larger context sizes and simply fetch a few hundred or even more results with multiple queries. It's going to be a lot faster and cheaper than vector search to scale that.
That is, the same thing that Amazon did to Mongo will happen to you?
Do you think working in the open enables you to spend more time on engineering and less on sales and marketing?
I’m currently building a Q&A chatbot and facing challenges in addressing the following scenario:
When a user asks:
"What do you mean in your previous statement?"
How does your framework handle retrieving the correct small subset of "raw knowledge" and integrating it into the LLM for a relevant response?
Without relying on external frameworks, I’ve struggled with this issue - https://www.reddit.com/r/LocalLLaMA/comments/1gtzdid/d_optim...
I’d love to know how your framework solves this and whether it can streamline the process.
Thank you!
It would be very useful to be able to compare this method to other establishes RAG techniques
Search for ReAct agents, can build using either LangGraph or Bedrock Agents.
Interestingly, going both ways: generate hypothetical answers for the query, and also generate hypothetical questions for the text chunk at ingestion both increase RAG performance in my experience.
Though LLM-based query-processing is not always suitable for chat applications if inference time is a concer (like near-real time customer support RAG), so ingestion-time hypothetical answer generation is more apt there.
I know benchmark datasets are not the be-all-end-all, but a halfway decent score and inference-time, would really help sell your framework (or help engineers make the choice).
In any case, very cool work, I built a lot of RAG pipelines as freelance NLP engineer and I will try this out.
Without that it often failed when users asked something like ("Can you expand point 2? , Give a detailed example of the above").
Current implementation(I have 3 indexes) is to provide Query + Past messages and ask an LLM to break it down into Overall ask: BM25 optimized question: Keywords: Semantic optimized question:
Perform RAG + Rerank and pass the top N passages after this along with the Overall ask in the second LLM call.
(Also readme says see examples folder but it's basically empty?)
What we find really effective is at content ingestion time, we prepend “decorator text” to the document or chunk. This incorporates various metadata about the document (title, author(s), publication date, etc).
Then at query time, we generate a contextual hypothetical document that matches the format of the decorator text.
We add hybrid search (BM25 and rerank) to that, also add filters (documents published between these dates, by this author, this type of content, etc). We have an LLM parameterize those filters and use them as part of our retrieval step.
This process works incredibly for end users.
Now, what is your comment precisely about, cause I'm pretty sure what mine was?
That ought to be enough for anybody.
> would TC support that
TC is a purely structural algorithm, it counts triangles so it doesn't take any weights into consideration, but it does return a vector of normalized ranking from 0.0 to 1.0, which you could combine with an existing biasing strategy to boost results that have strong centrality.
>3. Hypothetical answer generation from a query using an LLM, and then using that hypothetical answer to query for embeddings works really well.
What sort of performance are you getting in production with this one? The other two are basically solved for performance and RAG in general if it is related to a known and pre-processed corpus but I am having trouble thinking of how you don't get a hit with #3.
Past the initial sensation, it is pretty linear that "something good at language" (an interface) be integrated with "something good at information retrieval" (the data). (Still sought what comes next, "something to give reliability to processing".)
Ha, that's brilliant. Thanks for sharing this!
Example: Say you're searching an article and you want to know what occupation a mentioned person has, let's say the person 'Sharon,' is mentioned to have attended several physical chemistry conferences but her occupation is never explicitly mentioned. There's a very good chance every single rag approach will fail to return correct results, will fail to make this connection between 'occupation' attends conference, type of conference and infers 'chemist'. I could go on, but this sort of error is pervasive along all types of information when trying to retrieve with RAG. In the end, solutions like the above seem to just sort of reinvent other query methods, SQL, pagerank etc, with extra steps... there's little point in vectorization at that point...
This is an interesting observation to me. I would have expected that, since LLMs evolved from autocomplete/autocorrect algorithms, correcting spelling mistakes would be one of their strong suits. Do you have examples of cases where they fail?
At this moment I would not trust AI to automatically make changes.
Your solution looks interesting and I would love to hear more about it. I haven't seen that many PageRank-based graph exploration tools.
I actually think even BERT could be overkill here -- I have a half-baked prototype of a keyword expansion system that should do the trick here. The idea is is to construct a data structure of keywords ahead of time (e.g. by data-mining some portion of Common Crawl), where each keyword has "neighbors" -- words that often appear together and (sometimes, but not always) signal relatedness. I didn't take the concept very far yet, but I give it better than even odds! (Especially if the resulting data structure is pruned by a half-decent LLM -- my initial attempts resulted in a lot of questionable "neighbors" -- though I had a fairly small dataset so it's likely I was largely looking at noise.)
For live experiences like chat, we solved it with UX. As soon as you start typing the words of a question into the chat box, it does the FTS search and retrieves a set of documents that have word-matches, scored just using ES heuristics (eg: counting matching words etc)
These are presented as cards that expand when clicked. The user can see it's doing something.
While that's happening, also issue a full hyde flow in the background with a placeholder loading shimmer that loads in the full answer.
So there is some dead-time of about 10 seconds or so while it generates the hypothetical answers. After that, a short ~1 sec interval to load up the knowledge nodes, and then it starts streaming the answer.
This approach tested well with UXR participants and maintains acceptable accuracy.
A lot of the times, when looking for specific facts from a knowledge base, just the card UX gets an answer immediately. Eg: "What's the email for product support?"
From what I can tell, at least given the examples is that there is one global graph.
Thanks!
It can definitely be automated in my opinion, if you go with a supermajority workflow. Something that I've noticed with LLMs is it's very unlikely for all high-quality LLM models to be wrong at the same time. So if you go by a supermajority, the changes are almost certainly valid.
Having said all of that, I still believe we are not addressing the root cause of bad searches which is "garbage in, garbage out". I strongly believe the true calling for LLM will be to help us curate and manage data, at scale.
For something like uploading a big folder of documents, agree with the OP, pretty straightforward, naive in-memory with out-of-the-box embeddings, LLMs, retrieval, and untuned DBs goes far. I expect most vector-supporting dbaas and LLMaaS to be offering in the new year. OpenAI, Claude, and friends are already going in this direction, leaving the rag techniques opaque for now.
(Something folks may not appreciate, and I think is important about what's being done here, is the incremental update aspect.)
- View the current repository map using `/map`
- Force a refresh of the repository map using `/map-refresh`
If you want to save the repository map to a file for inspection, you can use [1]
aider --show-repo-map
[0] https://aider.chat/docs/usage/commands.html[1] https://aider.chat/docs/config/options.html#--show-repo-map
I assume there's got to be, but I don't have the capacity these days to root around and find it, and I'm genuinely worried about missing out on some really cool shit.
Initially I generated categories by asking an LLM with a long prompt(https://github.com/itissid/Drop-PoT/blob/main/src/drop_backe...) But I like your idea better!
My next iteration to solve this problem – I never got to it – was gonna be to generate the most appropriate categories based on user's personal interest, weather, time of day and non PII data and fine-tune a retrieval and a ranking engine to generate categories for each content piece personalized to them.
In case this thread helps someone else, some errors with —show-repo-map can be solved by setting environment variable PYTHONIOENCODING=utf-8
That would explain the empty output.