←back to thread

An LLM is a lossy encyclopedia

(simonwillison.net)
509 points tosh | 1 comments | | HN request time: 0s | source

(the referenced HN thread starts at https://news.ycombinator.com/item?id=45060519)
Show context
latexr ◴[] No.45101170[source]
A lossy encyclopaedia should be missing information and be obvious about it, not making it up without your knowledge and changing the answer every time.

When you have a lossy piece of media, such as a compressed sound or image file, you can always see the resemblance to the original and note the degradation as it happens. You never have a clear JPEG of a lamp, compress it, and get a clear image of the Milky Way, then reopen the image and get a clear image of a pile of dirt.

Furthermore, an encyclopaedia is something you can reference and learn from without a goal, it allows you to peruse information you have no concept of. Not so with LLMs, which you have to query to get an answer.

replies(10): >>45101190 #>>45101267 #>>45101510 #>>45101793 #>>45101924 #>>45102219 #>>45102694 #>>45104357 #>>45108609 #>>45112011 #
gjm11 ◴[] No.45102219[source]
Lossy compression does make things up. We call them compression artefacts.

In compressed audio these can be things like clicks and boings and echoes and pre-echoes. In compressed images they can be ripply effects near edges, banding in smoothly varying regions, but there are also things like https://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres... where one digit is replaced with a nice clean version of a different digit, which is pretty on-the-nose for the LLM failure mode you're talking about.

Compression artefacts generally affect small parts of the image or audio or video rather than replacing the whole thing -- but in the analogy, "the whole thing" is an encyclopaedia and the artefacts are affecting little bits of that.

Of course the analogy isn't exact. That would be why S.W. opens his post by saying "Since I love collecting questionable analogies for LLMs,".

replies(3): >>45102280 #>>45102368 #>>45103467 #
latexr ◴[] No.45102368[source]
I feel like my comment is pretty clear that a compression artefact is not the same thing as making the whole thing up.

> Of course the analogy isn't exact.

And I don’t expect it to be, which is something I’ve made clear several times before, including on this very thread.

https://news.ycombinator.com/item?id=45101679

replies(1): >>45111845 #
1. tsunamifury ◴[] No.45111845[source]
More disagreeing with no meaningful value to the conversation. This is you. Constantly.