←back to thread

230 points taikon | 3 comments | | HN request time: 0.639s | source
Show context
isoprophlex ◴[] No.42547133[source]
Fancy, I think, but again no word on the actual work of turning a few bazillion csv files and pdf's into a knowledge graph.

I see a lot of these KG tools pop up, but they never solve the first problem I have, which is actually constructing the KG itself.

replies(11): >>42547488 #>>42547556 #>>42547743 #>>42548481 #>>42549416 #>>42549856 #>>42549911 #>>42550327 #>>42551738 #>>42552272 #>>42562692 #
kergonath ◴[] No.42547488[source]
> I see a lot of these KG tools pop up, but they never solve the first problem I have, which is actually constructing the KG itself.

I have heard good things about Graphrag [1] (but what a stupid name). I did not have the time to try it properly, but it is supposed to build the knowledge graph itself somewhat transparently, using LLMs. This is a big stumbling block. At least vector stores are easy to understand and trivial to build.

It looks like KAG can do this from the summary on GitHub, but I could not really find how to do it in the documentation.

[1] https://microsoft.github.io/graphrag/

replies(3): >>42547518 #>>42547785 #>>42550262 #
1. isoprophlex ◴[] No.42547518[source]
Indeed they seem to actually know/show how the sausage is made... but still, no fire and forget approach for any random dataset. check out what you need to do if the default isnt working for you (scroll down to eg. entity_extraction settings). there is so much complexity there to deal with that i'd just roll my own extraction pipeline from the start, rather than learning someone elses complex setup (that you have to tweak for each new usecase)

https://microsoft.github.io/graphrag/config/yaml/

replies(2): >>42549293 #>>42549804 #
2. kergonath ◴[] No.42549293[source]
> i'd just roll my own extraction pipeline from the start, rather than learning someone elses complex setup

I have to agree. It’s actually quite a good summary of hacking with AI-related libraries these days. A lot of them get complex fast once you get slightly out of the intended path. I hope it’ll get better, but unfortunately it is where we are.

3. veggieroll ◴[] No.42549804[source]
IMO like with most other out-of-the-box LLM frameworks, the value is in looking at their prompts and then doing it yourself.

[1] https://github.com/microsoft/graphrag/tree/main/graphrag/pro...