Show HN: Min.js style compression of tech docs for LLM context

(github.com)

177 points marv1nnnnn | 1 comments | 15 May 25 13:40 UTC | HN request time: 0.205s | source

Show context

iandanforth ◴[15 May 25 15:08 UTC] No.43995844[source]▶

I applaud this effort, however the "Does it work?" section answers the wrong question. Anyone can write a trivial doc compressor and show a graph saying "The compressed version is smaller!"

For this to "work" you need to have a metric that shows that AIs perform as well, or nearly as well, as with the uncompressed documentation on a wide range of tasks.

replies(5): >>43996061 #>>43996217 #>>43996319 #>>43996840 #>>44003395 #

marv1nnnnn ◴[15 May 25 15:29 UTC] No.43996061[source]▶

>>43995844 #

I totally agreed with your critic. To be honest, it's even hard for myself to evaluate. What I do is select several packages that current LLM failed to handle, which are in the sample folder, `crawl4ai`, `google-genai` and `svelte`. And try some tricky prompt to see if it works. But even that evaluation is hard. LLM could hallucinate. I would say most time it works, but there are always few runs that failed to deliver. I actually prepared a comparison, cursor vs cursor + internet vs cursor + context7 vs cursor + llm-min.txt. But I thought it was stochastic, so I didn't put it here. Will consider add to repo as well

replies(5): >>43996846 #>>43997120 #>>43997327 #>>44002248 #>>44002383 #

timhigins ◴[15 May 25 17:32 UTC] No.43997327[source]▶

>>43996061 #

> LLM could hallucinate

The job of any context retrieval system is to retrieve the relevant info for the task so the LLM doesn't hallucinate. Maybe build a benchmark based on less-known external libraries with test cases that can check the output is correct (or with a mocking layer to know that the LLM-generated code calls roughly the correct functions).

replies(1): >>44000925 #

1. marv1nnnnn ◴[16 May 25 01:13 UTC] No.44000925[source]▶

>>43997327 #

Thanks for the feedback. This will be my next step. Personally I feel it's hard to design those test cases (by myself)

↑