KAG – Knowledge Graph RAG Framework

Fellow legal tech founder here. The first thing I look at in projects like this are the prompts:

https://github.com/OpenSPG/KAG/blob/master/kag/builder/promp...

All you’re doing here is “front loading” AI: Imstead of running slow and expensive LLMs at query time, you run them at index time.

It’s a method for data augmentation or, in database lingo, index building. You use LLMs to add context to chunks that doesn’t exist on either the word level (searchable by BM25) or the semantic level (searchable by embeddings).

A simple version of this would be to ask an LLM:

“List all questions this chunk is answering.” [0]

But you can do the same thing for time frames, objects, styles, emotions — whatever you need a “handle” for to later retrieve via BM25 or semantic similarity.

I dreamed of doing that back in 2020, but it would’ve been prohibitively expensive. Because it requires passing your whole corpus through an LLM, possibly multiple times, once for each “angle”.

That being said, I recommend running any “Graph RAG” system you see here on HN over some 1% or so of your data. And then look inside the database. Look at all text chunks, original and synthetic, that are now in your index.

I’ve done this for a consulting client who absolutely wanted “Graph RAG”. I found the result to be an absolute mess. That is because these systems are built to cover a broad range of applications and are not adapted at all to your problem domain.

So I prefer working backwards:

What kinds of queries do I need to handle? What does the prompt to my query time LLM need to look like? What context will the LLM need? How can I have this context for each of my chunks, and be able to search by match air similarity? And now how can I make an LLM return exactly that kind of context, with as few hallucinations and as little filler as possible, for each of my chunks?

This gives you a very lean, very efficient index that can do everything you want.

[0] For a prompt, you’d add context and give the model “space to think”, especially when using a smaller model. Also, you’d instruct it to use a particular format, so you can parse out the part that you need. This “unfancy” approach lets you switch out models easily and compare them against each other without having to care about different APIs for “structured output”.