←back to thread

176 points marv1nnnnn | 3 comments | | HN request time: 0.824s | source
1. thegeomaster ◴[] No.43995997[source]
What is absolutely essential to present here, but is missing, is a rigorous evaluation of task completion effectiveness between an agent using this format vs the original format. It has to be done on a new library which is guaranteed not to be present in the training set.

As it stands, there is nothing demonstrating that this lossy compression doesn't destroy essential information that an LLM would need.

I also have a gut feeling that the average LLM will actually have more trouble with the dense format + the instructions to decode it than a huge human-readable file. Remember, LLMs are trained on internet content, which contains terabytes of textual technical documentation but 0 bytes of this ad-hoc format.

I am happy to be proven wrong on both points (LLMs are also very unpredictable!), but the burden of proof for an extravagant scheme like this lies solely on the author.

replies(1): >>43996208 #
2. marv1nnnnn ◴[] No.43996208[source]
Agree, actually this approach isn't even possible without the birth of reasoning LLM. In my test, reasoning LLM perform much better than non-reasoning LLM in interpreting the compressed file. Those LLMs are really good at understanding abstraction.
replies(1): >>43996569 #
3. thegeomaster ◴[] No.43996569[source]
My point still stands --- the reasoning tokens being consumed to interpret the abstracted llms.txt could have been used for solving the problem at hand.

Again, I'm not saying the solution doesn't work well (my intuition on LLMs has been wrong enough times), but it would be really helpful/assuring to see some hard data.