An LLM is a lossy encyclopedia

(simonwillison.net)

509 points tosh | 2 comments | 29 Aug 25 09:40 UTC | HN request time: 0.416s | source

(the referenced HN thread starts at https://news.ycombinator.com/item?id=45060519)

Show context

latexr ◴[02 Sep 25 10:27 UTC] No.45101170[source]▶

>>45062046 (OP) #

A lossy encyclopaedia should be missing information and be obvious about it, not making it up without your knowledge and changing the answer every time.

When you have a lossy piece of media, such as a compressed sound or image file, you can always see the resemblance to the original and note the degradation as it happens. You never have a clear JPEG of a lamp, compress it, and get a clear image of the Milky Way, then reopen the image and get a clear image of a pile of dirt.

Furthermore, an encyclopaedia is something you can reference and learn from without a goal, it allows you to peruse information you have no concept of. Not so with LLMs, which you have to query to get an answer.

replies(10): >>45101190 #>>45101267 #>>45101510 #>>45101793 #>>45101924 #>>45102219 #>>45102694 #>>45104357 #>>45108609 #>>45112011 #

1. gf000 ◴[02 Sep 25 11:58 UTC] No.45101924[source]▶

>>45101170 #

I don't think there is a singular "should" that fits every use case.

E.g. a Bloom filter also doesn't "know" what it knows.

replies(1): >>45101968 #

2. latexr ◴[02 Sep 25 12:03 UTC] No.45101968[source]▶

>>45101924 (TP) #

I don’t understand the point you’re trying to make. The given example confused me further, since nothing in my argument is concerned with the tool “knowing” anything, that has no relation to the idea I’m expressing.

I do understand and agree with a different point you’re making somewhere else in this thread, but it doesn’t seem related to what you’re saying here.

https://news.ycombinator.com/item?id=45101946

↑