I see a lot of these KG tools pop up, but they never solve the first problem I have, which is actually constructing the KG itself.
I see a lot of these KG tools pop up, but they never solve the first problem I have, which is actually constructing the KG itself.
I have heard good things about Graphrag [1] (but what a stupid name). I did not have the time to try it properly, but it is supposed to build the knowledge graph itself somewhat transparently, using LLMs. This is a big stumbling block. At least vector stores are easy to understand and trivial to build.
It looks like KAG can do this from the summary on GitHub, but I could not really find how to do it in the documentation.
I’ve heard of a few very large companies using glean (https://www.glean.com/)
This is the route I’d take if I wanted to make a business around rag.
It is trivial, completely devoid of any creativity, and most importantly quite difficult to google. It’s like they did not really think about it even for 5 seconds before uploading.
> if anything its too generic and multiple people who have the same idea now cannot use the name bc microsoft made the most noise about it.
Exactly! Anyway, I am not judging the software, which I have yet to try properly.
I have to agree. It’s actually quite a good summary of hacking with AI-related libraries these days. A lot of them get complex fast once you get slightly out of the intended path. I hope it’ll get better, but unfortunately it is where we are.
[1] https://github.com/microsoft/graphrag/tree/main/graphrag/pro...
https://github.com/getzep/graphiti
I’m one of the authors. Happy to answer any questions.
NLP is fast but requires a model that is trained on an ontology that works with your data. Once you do, it’s a matter of simply feeling the model your bazillion CSVs and PDFs.
LLMs are slow but way easier to start as ontologies can be generated on the fly. This is a double edged sword however as LLMs have a tendency to lose fidelity and consistency on edge naming.
I work in NLP, which is the most used in practice as it’s far more consistent and explainable in very large corpora. But the difficulty in starting a fresh ontology dead ends many projects.
Don't have time to scan the source code myself, but are you using the OpenAI python library, so the server URL can easily be changed? Didn't see it exposed by your library, so hoping it can at least be overridden with a env var, so we could use local LLMs instead.
This is a common issue I've seen from LLM projects that only kind-of understand what is going on here and try and separate their vector database w/ semantic edge information into something that has a formal name.
I’ve noticed this too and the ironic thing is that building the KG is the most critical part of making everything work.
> We recommend that you put this on a local fork as we really want the service to be as lightweight and simple as possible as we see this asa good entry point into new developers.
Sadly, it seems like you're recommending forking the library instead of allowing people to use local LLMs. You were smart enough to lock the PR from any further conversation at least :)
https://neuml.hashnode.dev/advanced-rag-with-graph-path-trav...
So yes, there's a huge pile of tools and software for working with knowledge graphs, but to date populating the graph is still the realm of human experts.
Perhaps one needs to manually create a starting point then ask the LLM to propse links to various documents or follow an existing one.
Sufficiently loopable transversal should create a KG
This becomes a cyclical hallucination problem. The LLM hallucinates and create incorrect graph which in turn creates even more incorrect knowledge.
We are working on this issue of reducing hallucination in knowledge graphs and using LLM is not at all the right way.
I’ve had good success with CIM for Utilities to build a network graph for modelling the distribution and transmission networks adding sensor and event data for monitoring and analysis about 15 years ago.
Anywhere there is a technology focussed consortium of vendors and users building standards you will likely find a prebuilt graph. When RDF was “hot” many of the these groups spun out some attempt to model their domain.
In summary, if you need one look for one. Maybe there’s one waiting for you and you get to do less convincing and more doing.