←back to thread

230 points taikon | 8 comments | | HN request time: 0.001s | source | bottom
Show context
isoprophlex ◴[] No.42547133[source]
Fancy, I think, but again no word on the actual work of turning a few bazillion csv files and pdf's into a knowledge graph.

I see a lot of these KG tools pop up, but they never solve the first problem I have, which is actually constructing the KG itself.

replies(11): >>42547488 #>>42547556 #>>42547743 #>>42548481 #>>42549416 #>>42549856 #>>42549911 #>>42550327 #>>42551738 #>>42552272 #>>42562692 #
1. kergonath ◴[] No.42547488[source]
> I see a lot of these KG tools pop up, but they never solve the first problem I have, which is actually constructing the KG itself.

I have heard good things about Graphrag [1] (but what a stupid name). I did not have the time to try it properly, but it is supposed to build the knowledge graph itself somewhat transparently, using LLMs. This is a big stumbling block. At least vector stores are easy to understand and trivial to build.

It looks like KAG can do this from the summary on GitHub, but I could not really find how to do it in the documentation.

[1] https://microsoft.github.io/graphrag/

replies(3): >>42547518 #>>42547785 #>>42550262 #
2. isoprophlex ◴[] No.42547518[source]
Indeed they seem to actually know/show how the sausage is made... but still, no fire and forget approach for any random dataset. check out what you need to do if the default isnt working for you (scroll down to eg. entity_extraction settings). there is so much complexity there to deal with that i'd just roll my own extraction pipeline from the start, rather than learning someone elses complex setup (that you have to tweak for each new usecase)

https://microsoft.github.io/graphrag/config/yaml/

replies(2): >>42549293 #>>42549804 #
3. swyx ◴[] No.42547785[source]
why stupid? it uses a Graph in RAG. graphrag. if anything its too generic and multiple people who have the same idea now cannot use the name bc microsoft made the most noise about it.
replies(2): >>42548744 #>>42549276 #
4. washadjeffmad ◴[] No.42548744[source]
It's the sound and Bobcat Goldthwait would have made if he'd voiced the Aflac duck.
5. kergonath ◴[] No.42549276[source]
> why stupid? it uses a Graph in RAG. graphrag.

It is trivial, completely devoid of any creativity, and most importantly quite difficult to google. It’s like they did not really think about it even for 5 seconds before uploading.

> if anything its too generic and multiple people who have the same idea now cannot use the name bc microsoft made the most noise about it.

Exactly! Anyway, I am not judging the software, which I have yet to try properly.

6. kergonath ◴[] No.42549293[source]
> i'd just roll my own extraction pipeline from the start, rather than learning someone elses complex setup

I have to agree. It’s actually quite a good summary of hacking with AI-related libraries these days. A lot of them get complex fast once you get slightly out of the intended path. I hope it’ll get better, but unfortunately it is where we are.

7. veggieroll ◴[] No.42549804[source]
IMO like with most other out-of-the-box LLM frameworks, the value is in looking at their prompts and then doing it yourself.

[1] https://github.com/microsoft/graphrag/tree/main/graphrag/pro...

8. TrueDuality ◴[] No.42550262[source]
GraphRAG isn't quite a knowledge graph. It is a graph of document snippets with semantic relations but is not doing fact extraction nor can you do any reasoning over the structure itself.

This is a common issue I've seen from LLM projects that only kind-of understand what is going on here and try and separate their vector database w/ semantic edge information into something that has a formal name.