An LLM is a lossy encyclopedia

(simonwillison.net)

509 points tosh | 1 comments | 29 Aug 25 09:40 UTC | HN request time: 0s | source

(the referenced HN thread starts at https://news.ycombinator.com/item?id=45060519)

Show context

latexr ◴[02 Sep 25 10:27 UTC] No.45101170[source]▶

>>45062046 (OP) #

A lossy encyclopaedia should be missing information and be obvious about it, not making it up without your knowledge and changing the answer every time.

When you have a lossy piece of media, such as a compressed sound or image file, you can always see the resemblance to the original and note the degradation as it happens. You never have a clear JPEG of a lamp, compress it, and get a clear image of the Milky Way, then reopen the image and get a clear image of a pile of dirt.

Furthermore, an encyclopaedia is something you can reference and learn from without a goal, it allows you to peruse information you have no concept of. Not so with LLMs, which you have to query to get an answer.

replies(10): >>45101190 #>>45101267 #>>45101510 #>>45101793 #>>45101924 #>>45102219 #>>45102694 #>>45104357 #>>45108609 #>>45112011 #

TacticalCoder ◴[02 Sep 25 10:44 UTC] No.45101267[source]▶

>>45101170 #

> You never have a clear JPEG of a lamp, compress it, and get a clear image of the Milky Way, then reopen the image and get a clear image of a pile of dirt.

Oh but it's much worse than that: because most LLMs aren't deterministic in the way they operate [1], you can get a pristine image of a different pile of dirt every single time you ask.

[1] there are models where if you have the "model + prompt + seed" you're at least guaranteed to get the same output every single time. FWIW I use LLMs but I cannot integrate them in anything I produce when what they output ain't deterministic.

replies(2): >>45101776 #>>45102863 #

ACCount37 ◴[02 Sep 25 13:29 UTC] No.45102863[source]▶

>>45101267 #

"Deterministic" is overrated.

Computers are deterministic. Most of the time. If you really don't think about all the times they aren't. But if you leave the CPU-land and go out into the real world, you don't have the privilege of working with deterministic systems at all.

Engineering with LLMs is closer to "designing a robust industrial process that's going to be performed by unskilled minimum wage workers" than it is to "writing a software algorithm". It's still an engineering problem - but of the kind that requires an entirely different frame of mind to tackle.

replies(1): >>45103760 #

latexr ◴[02 Sep 25 14:42 UTC] No.45103760[source]▶

>>45102863 #

And one major issue is that LLMs are largely being sold and understood more like reliable algorithms than what they really are.

If everyone understood the distinction and their limitations, they wouldn’t be enjoying this level of hype, or leading to teen suicides and people giving themselves centuries-old psychiatric illnesses. If you “go out into the real world” you learn people do not understand LLMs aren’t deterministic and that they shouldn’t blindly accept their outputs.

https://archive.ph/rdL9W

https://archive.ph/20241023235325/https://www.nytimes.com/20...

https://archive.ph/20250808145022/https://www.404media.co/gu...

replies(1): >>45103951 #

ACCount37 ◴[02 Sep 25 14:57 UTC] No.45103951[source]▶

>>45103760 #

It's nothing new. LLMs are unreliable, but in the same ways humans are.

replies(2): >>45104099 #>>45105453 #

latexr ◴[02 Sep 25 15:09 UTC] No.45104099[source]▶

>>45103951 #

But LLMs output is not being treated the same as human output, and that comparison is both tired and harmful. People are routinely acting like “this is true because ChatGPT said so” while they wouldn’t do the same for any random human.

LLMs aren’t being sold as unreliable. On the contrary, they are being sold as the tool which will replace everyone and do a better job at a fraction of the piece.

replies(1): >>45104283 #

1. ACCount37 ◴[02 Sep 25 15:23 UTC] No.45104283{3}[source]▶

>>45104099 #

That comparison is more useful than the alternatives. Anthropomorphic framing is one of the best framings we have for understanding what properties LLMs have.

"LLM is like an overconfident human" certainly beats both "LLM is like a computer program" and "LLM is like a machine god". It's not perfect, but it's the best fit at 2 words or less.

↑