An LLM is a lossy encyclopedia

(simonwillison.net)

509 points tosh | 4 comments | 29 Aug 25 09:40 UTC | HN request time: 0s | source

(the referenced HN thread starts at https://news.ycombinator.com/item?id=45060519)

Show context

quincepie ◴[02 Sep 25 10:35 UTC] No.45101219[source]▶

>>45062046 (OP) #

I totally agree with the author. Sadly, I feel like that's not what the majority of LLM users tend to view LLMs. And it's definitely not what AI companies marketing.

> The key thing is to develop an intuition for questions it can usefully answer vs questions that are at a level of detail where the lossiness matters

the problem is that in order to develop an intuition for questions that LLMs can answer, the user will at least need to know something about the topic beforehand. I believe that this lack of initial understanding of the user input is what can lead to taking LLM output as factual. If one side of the exchange knows nothing about the subject, the other side can use jargon and even present random facts or lossy facts which can almost guarantee to impress the other side.

> The way to solve this particular problem is to make a correct example available to it.

My question is how much effort would it take to make a correct example available for the LLM before it can output quality and useful data? If the effort I put in is more than what I would get in return, then I feel like it's best to write and reason it myself.

replies(7): >>45102038 #>>45102286 #>>45103159 #>>45103931 #>>45104349 #>>45105150 #>>45116121 #

cj ◴[02 Sep 25 13:54 UTC] No.45103159[source]▶

>>45101219 #

> the user will at least need to know something about the topic beforehand.

I used ChatGPT 5 over the weekend to double check dosing guidelines for a specific medication. "Provide dosage guidelines for medication [insert here]"

It spit back dosing guidelines that were an order of magnitude wrong (suggested 100mcg instead of 1mg). When I saw 100mcg, I was suspicious and said "I don't think that's right" and it quickly corrected itself and provided the correct dosing guidelines.

These are the kind of innocent errors that can be dangerous if users trust it blindly.

The main challenge is LLMs aren't able to gauge confidence in its answers, so it can't adjust how confidently it communicates information back to you. It's like compressing a photo, and a photographer wrongly saying "here's the best quality image I have!" - do you trust the photographer at their word, or do you challenge him to find a better quality image?

replies(12): >>45103322 #>>45103346 #>>45103459 #>>45103642 #>>45106112 #>>45106634 #>>45108321 #>>45108605 #>>45109136 #>>45110008 #>>45110773 #>>45112140 #

1. blehn ◴[02 Sep 25 14:10 UTC] No.45103346[source]▶

>>45103159 #

Perhaps the absolute worst use-case for an LLM

replies(2): >>45109770 #>>45110264 #

2. redundantly ◴[02 Sep 25 22:09 UTC] No.45109770[source]▶

>>45103346 (TP) #

And one that likely happens often.

3. dragontamer ◴[02 Sep 25 23:05 UTC] No.45110264[source]▶

>>45103346 (TP) #

My mom was looking up church times in the Philippines. Google AI was wrong pretty much every time.

Why is an LLM unable to read a table of church times across a sampling of ~5 Filipino churches?

Google LLM (Gemini??) was clearly finding the correct page. I just grabbed my mom's phone after another bad mass time and clicked on the hyperlink. The LLM was seemingly unable to parse the table at all.

replies(1): >>45135214 #

4. etherealG ◴[05 Sep 25 05:22 UTC] No.45135214[source]▶

>>45110264 #

Because google search and llm teams are different, with different incentives. Search is the cash cow they keep squeezing for more cash at the expense of good quality since at least 2018, as revealed in court documents showing they did that on purpose to keep people searching more to have more ads and more revenue. Google AI embedded in search has the same goals, keep you clicking on ads… my guess would be Gemini doesn’t have any of the bad part of enshitification yet… but it will come. If you think hallucinations are bad now, just you wait until tech companies start tuning them up on purpose to get you to make more prompts so they can inject more ads!

↑