←back to thread

An LLM is a lossy encyclopedia

(simonwillison.net)
509 points tosh | 1 comments | | HN request time: 0s | source

(the referenced HN thread starts at https://news.ycombinator.com/item?id=45060519)
Show context
latexr ◴[] No.45101170[source]
A lossy encyclopaedia should be missing information and be obvious about it, not making it up without your knowledge and changing the answer every time.

When you have a lossy piece of media, such as a compressed sound or image file, you can always see the resemblance to the original and note the degradation as it happens. You never have a clear JPEG of a lamp, compress it, and get a clear image of the Milky Way, then reopen the image and get a clear image of a pile of dirt.

Furthermore, an encyclopaedia is something you can reference and learn from without a goal, it allows you to peruse information you have no concept of. Not so with LLMs, which you have to query to get an answer.

replies(10): >>45101190 #>>45101267 #>>45101510 #>>45101793 #>>45101924 #>>45102219 #>>45102694 #>>45104357 #>>45108609 #>>45112011 #
simonw ◴[] No.45101190[source]
I think you are missing the point of the analogy: a lossy encyclopedia is obviously a bad idea, because encyclopedias are meant to be reliable places to look up facts.
replies(5): >>45101249 #>>45101251 #>>45102590 #>>45105765 #>>45105785 #
latexr ◴[] No.45101249[source]
And my point is that “lossy” does not mean “unreliable”. LLMs aren’t reliable sources of facts, no argument there, but a true lossy encyclopaedia might be. Lossy algorithms don’t just make up and change information, they remove it from places where they might not make a difference to the whole. A lossy encyclopaedia might be one where, for example, you remove the images plus gramatical and phonetic information. Eventually you might compress the information where the entry for “dog” only reads “four legged creature”—which is correct but not terribly helpful—but you wouldn’t get “space mollusk”.
replies(1): >>45101265 #
simonw ◴[] No.45101265[source]
I don't think a "true lossy encylopedia" is a thing that has ever existed.
replies(3): >>45101606 #>>45105757 #>>45107425 #
latexr ◴[] No.45101606{3}[source]
One could argue that’s what a pocket encyclopaedia (those exist) is. But even if we say they don’t, when you make up a term by mushing two existing words together it helps if the term makes sense. Otherwise, why even use the existing words? You called it a “lossy enyclopedia” and not a “spaghetti ice cream” for a reason, presumably so the term evokes an image or concept in the mind of the reader. If it’s bringing up a different image than what you intended, perhaps it’s not a good term.

I remember you being surprised when the term “vibe coding” deviated from its original intention (I know you didn’t come up with it). But frankly I was surprised at your surprise—it was entirely predictable and obvious how the term was going to be used. The concept I’m attempting to communicate to you is that when you make up a term you have to think not only of the thing in your head but also of the image it conjures up in other people’s minds. Communication is a two-way street.

replies(1): >>45103398 #
1. nyeah ◴[] No.45103398{4}[source]
I think you're saying that "pocket encyclopedia" is one definition of "lossy encyclopedia" that may occur to people (or that may get marketed on purpose). But that's a very poor definition of LLMs. And so the danger is that people may lock onto a wildly misleading definition. Am I getting the point?