←back to thread

An LLM is a lossy encyclopedia

(simonwillison.net)
509 points tosh | 4 comments | | HN request time: 0.017s | source

(the referenced HN thread starts at https://news.ycombinator.com/item?id=45060519)
Show context
quincepie ◴[] No.45101219[source]
I totally agree with the author. Sadly, I feel like that's not what the majority of LLM users tend to view LLMs. And it's definitely not what AI companies marketing.

> The key thing is to develop an intuition for questions it can usefully answer vs questions that are at a level of detail where the lossiness matters

the problem is that in order to develop an intuition for questions that LLMs can answer, the user will at least need to know something about the topic beforehand. I believe that this lack of initial understanding of the user input is what can lead to taking LLM output as factual. If one side of the exchange knows nothing about the subject, the other side can use jargon and even present random facts or lossy facts which can almost guarantee to impress the other side.

> The way to solve this particular problem is to make a correct example available to it.

My question is how much effort would it take to make a correct example available for the LLM before it can output quality and useful data? If the effort I put in is more than what I would get in return, then I feel like it's best to write and reason it myself.

replies(7): >>45102038 #>>45102286 #>>45103159 #>>45103931 #>>45104349 #>>45105150 #>>45116121 #
1. theshrike79 ◴[] No.45102038[source]
> the problem is that in order to develop an intuition for questions that LLMs can answer, the user will at least need to know something about the topic beforehand

This is why simonw (The author) has his "pelican on a bike" -test, it's not 100% accurate but it is a good indicator.

I have a set of my own standard queries and problems (no counting characters or algebra crap) I feed to new LLMs I'm testing

None of the questions exist outside of my own Obsidian note so they can't be gamed by LLM authors. And I've tested multiple different LLMs using them so I have a "feeling" on what the answer should look like. And I personally know the correct answer so I can immediately validate them.

replies(1): >>45102192 #
2. barapa ◴[] No.45102192[source]
They are training on your queries. So they may have some exposure to them going forward.
replies(2): >>45102416 #>>45103144 #
3. keysdev ◴[] No.45102416[source]
Not if one ollama pull to ur machine.
4. franktankbank ◴[] No.45103144[source]
Even if your queries are hidden via a local running model you must have some humility that your queries are not actually unique. For this reason I have a very difficult time believing that a basic LLM will be able to properly reason about complex topics, it can regurgitate to whatever level its been trained. That doesn't make it less useful though. But on the edge case how do we know the query its ingesting gets trained with a suitable answer? Wouldn't this constitute an over-fitting in these cases and be terribly self-reinforcing?