←back to thread

28 points eigenvalue | 2 comments | | HN request time: 0.426s | source

I was inspired by a recent tweet by Andrej Karpathy, as well as my own experience copying and pasting a bunch of html docs into Claude yesterday and bemoaning how long-winded and poorly formatted it was.

I’m trying to decide if I should make it into a full-fledged service and completely automate the process of generating the distilled documentation.

Problem is that it would cost a lot in API tokens and wouldn’t generate any revenue (plus it would have to be updated as documentation changes significantly). Maybe Anthropic wants to fund it as a public good? Let me know!

Show context
darkteflon ◴[] No.43369057[source]
It’s a cool idea. I’ve wasted a lot of time over the past few months futzing around with beautifulsoup, Playwright and others I forget, or cloning entire repos and trying to figure out exactly which incantations for which build tools are going to get me the built docs I need, all in service of setting them up for retrieval and use by LLMs. Some projects (e.g. Godot, Blender, Django) make it very easy. Others do not (Dagster is giving me headaches at the moment).

I would probably prefer to receive unmodified, plain text/md versions (with the heavy lifting done by, e.g., docling, unstructured) than LLM summaries though, since I’d rather produce my own distillations.

I would pay for that kind of thing. I think the intersection between ethical scraping and making things machine-readable is fertile ground. For a lot of companies it’s something that can be of great value, but is also non-trivial to do well and unlikely to be a core competency in-house.

replies(2): >>43369146 #>>43369207 #
1. Noumenon72 ◴[] No.43369207[source]
Dagster's docs must be LLM-readable somehow because their LLM that lets you ask questions about their docs is the best RAG experience I've had yet.
replies(1): >>43369364 #
2. darkteflon ◴[] No.43369364[source]
That’s good to know - haven’t tried it, tbh, since they’re usually so poor, but will definitely check it out. Good stopgap solution.