←back to thread

176 points marv1nnnnn | 1 comments | | HN request time: 0.561s | source
Show context
ricardobeat ◴[] No.44000370[source]
I'm a little disappointed. Was excited to try this, and it seemed to work initially. But then I gave it a real website to scrape, and it always hangs after only parsing ~10 out of 50+ pages, before even getting to the compression step.

Then I decided to try and switch to the local mode, and after ~ an hour figuring out how to build a markdown version of the docs I needed, hit the "object has no attribute 'generate_from_text'" error, as someone else also reported [1].

So I cloned the source and started to look around, and the method really doesn't exist, even though it's called from main.py. A comment above it says "Assuming LLMMinGenerator has a method to process raw text" and I immediately feel the waft of vibe coding... this is all a mirage. I saw a long README and assumed it was real, but that was probably written by an LLM as well. Would have been obvious by the 'IntegratedKnowledgeManifest_SKF' and 'GenerationTimestamp' keys in the 'SKF format' definition - the former makes no sense, and neither has any reason to be this verbose when the goal is compression.

replies(1): >>44001033 #
1. marv1nnnnn ◴[] No.44001033[source]
I just fixed the local version.. My bad, totally missed it.If you are still interested, could try it again. About the scrping step, this project using crawl4ai to scrape. Suppose the url is https://xxx/yy/, it will only scrape https://xxx/yy/*. You could post it as a github issue, will try to fix it.