←back to thread

An LLM is a lossy encyclopedia

(simonwillison.net)
509 points tosh | 1 comments | | HN request time: 0s | source

(the referenced HN thread starts at https://news.ycombinator.com/item?id=45060519)
Show context
latexr ◴[] No.45101170[source]
A lossy encyclopaedia should be missing information and be obvious about it, not making it up without your knowledge and changing the answer every time.

When you have a lossy piece of media, such as a compressed sound or image file, you can always see the resemblance to the original and note the degradation as it happens. You never have a clear JPEG of a lamp, compress it, and get a clear image of the Milky Way, then reopen the image and get a clear image of a pile of dirt.

Furthermore, an encyclopaedia is something you can reference and learn from without a goal, it allows you to peruse information you have no concept of. Not so with LLMs, which you have to query to get an answer.

replies(10): >>45101190 #>>45101267 #>>45101510 #>>45101793 #>>45101924 #>>45102219 #>>45102694 #>>45104357 #>>45108609 #>>45112011 #
simonw ◴[] No.45101190[source]
I think you are missing the point of the analogy: a lossy encyclopedia is obviously a bad idea, because encyclopedias are meant to be reliable places to look up facts.
replies(5): >>45101249 #>>45101251 #>>45102590 #>>45105765 #>>45105785 #
latexr ◴[] No.45101249[source]
And my point is that “lossy” does not mean “unreliable”. LLMs aren’t reliable sources of facts, no argument there, but a true lossy encyclopaedia might be. Lossy algorithms don’t just make up and change information, they remove it from places where they might not make a difference to the whole. A lossy encyclopaedia might be one where, for example, you remove the images plus gramatical and phonetic information. Eventually you might compress the information where the entry for “dog” only reads “four legged creature”—which is correct but not terribly helpful—but you wouldn’t get “space mollusk”.
replies(1): >>45101265 #
simonw ◴[] No.45101265[source]
I don't think a "true lossy encylopedia" is a thing that has ever existed.
replies(3): >>45101606 #>>45105757 #>>45107425 #
1. prerok ◴[] No.45107425{3}[source]
Since sibling comments all seem to have concentrated on idealistic good intent, I would also like to point out a different side of things.

I grew up in socialism. Since we've transitioned to democracy, I learned that I have to unlearn some things. Our encyclopedias were not inaccurate but were not complete. It's like lying through omission. And as the old saying goes, half-truths are worse than lies.

Whether this would be deemed as a lossy encyclopedia, I don't know. What I am certain of, however, is that it was accurate but omitted important additional facts.

And that is what I see in LLMs as well. Overall, it's accurate, except in cases where an additional fact would alter the conclusion. So, it either could not find arguments with that fact, or it chose to ignore them to give an answer and could be prompted into taking them into account or whatever.

What I do know is that LLMs of today give me the same hibbie-jibbies that rereading those encyclopedias of my youth give me.