←back to thread

An LLM is a lossy encyclopedia

(simonwillison.net)
509 points tosh | 1 comments | | HN request time: 0s | source

(the referenced HN thread starts at https://news.ycombinator.com/item?id=45060519)
Show context
latexr ◴[] No.45101170[source]
A lossy encyclopaedia should be missing information and be obvious about it, not making it up without your knowledge and changing the answer every time.

When you have a lossy piece of media, such as a compressed sound or image file, you can always see the resemblance to the original and note the degradation as it happens. You never have a clear JPEG of a lamp, compress it, and get a clear image of the Milky Way, then reopen the image and get a clear image of a pile of dirt.

Furthermore, an encyclopaedia is something you can reference and learn from without a goal, it allows you to peruse information you have no concept of. Not so with LLMs, which you have to query to get an answer.

replies(10): >>45101190 #>>45101267 #>>45101510 #>>45101793 #>>45101924 #>>45102219 #>>45102694 #>>45104357 #>>45108609 #>>45112011 #
simonw ◴[] No.45101190[source]
I think you are missing the point of the analogy: a lossy encyclopedia is obviously a bad idea, because encyclopedias are meant to be reliable places to look up facts.
replies(5): >>45101249 #>>45101251 #>>45102590 #>>45105765 #>>45105785 #
latexr ◴[] No.45101249[source]
And my point is that “lossy” does not mean “unreliable”. LLMs aren’t reliable sources of facts, no argument there, but a true lossy encyclopaedia might be. Lossy algorithms don’t just make up and change information, they remove it from places where they might not make a difference to the whole. A lossy encyclopaedia might be one where, for example, you remove the images plus gramatical and phonetic information. Eventually you might compress the information where the entry for “dog” only reads “four legged creature”—which is correct but not terribly helpful—but you wouldn’t get “space mollusk”.
replies(1): >>45101265 #
simonw ◴[] No.45101265[source]
I don't think a "true lossy encylopedia" is a thing that has ever existed.
replies(3): >>45101606 #>>45105757 #>>45107425 #
1. ianburrell ◴[] No.45105757{3}[source]
All encyclopedias are lossy. They curate the info they include, only choosing important topics. Wikipedia is lossy. They delete whole articles for irrelevance. They edit changes to make them more concise. They require sources for facts. All good things, but Wikipedia is a subset of human knowledge.