An LLM is a lossy encyclopedia

(simonwillison.net)

509 points tosh | 3 comments | 29 Aug 25 09:40 UTC | HN request time: 0.7s | source

(the referenced HN thread starts at https://news.ycombinator.com/item?id=45060519)

Show context

quincepie ◴[02 Sep 25 10:35 UTC] No.45101219[source]▶

>>45062046 (OP) #

I totally agree with the author. Sadly, I feel like that's not what the majority of LLM users tend to view LLMs. And it's definitely not what AI companies marketing.

> The key thing is to develop an intuition for questions it can usefully answer vs questions that are at a level of detail where the lossiness matters

the problem is that in order to develop an intuition for questions that LLMs can answer, the user will at least need to know something about the topic beforehand. I believe that this lack of initial understanding of the user input is what can lead to taking LLM output as factual. If one side of the exchange knows nothing about the subject, the other side can use jargon and even present random facts or lossy facts which can almost guarantee to impress the other side.

> The way to solve this particular problem is to make a correct example available to it.

My question is how much effort would it take to make a correct example available for the LLM before it can output quality and useful data? If the effort I put in is more than what I would get in return, then I feel like it's best to write and reason it myself.

replies(7): >>45102038 #>>45102286 #>>45103159 #>>45103931 #>>45104349 #>>45105150 #>>45116121 #

cj ◴[02 Sep 25 13:54 UTC] No.45103159[source]▶

>>45101219 #

> the user will at least need to know something about the topic beforehand.

I used ChatGPT 5 over the weekend to double check dosing guidelines for a specific medication. "Provide dosage guidelines for medication [insert here]"

It spit back dosing guidelines that were an order of magnitude wrong (suggested 100mcg instead of 1mg). When I saw 100mcg, I was suspicious and said "I don't think that's right" and it quickly corrected itself and provided the correct dosing guidelines.

These are the kind of innocent errors that can be dangerous if users trust it blindly.

The main challenge is LLMs aren't able to gauge confidence in its answers, so it can't adjust how confidently it communicates information back to you. It's like compressing a photo, and a photographer wrongly saying "here's the best quality image I have!" - do you trust the photographer at their word, or do you challenge him to find a better quality image?

replies(12): >>45103322 #>>45103346 #>>45103459 #>>45103642 #>>45106112 #>>45106634 #>>45108321 #>>45108605 #>>45109136 #>>45110008 #>>45110773 #>>45112140 #

BeetleB ◴[02 Sep 25 20:31 UTC] No.45108605[source]▶

>>45103159 #

> I used ChatGPT 5 over the weekend to double check dosing guidelines for a specific medication.

This use case is bad by several degrees.

Consider an alternative: Using Google to search for it and relying on its AI generated answer. This usage would be bad by one degree less, but still bad.

What about using Google and clicking on one of the top results? Maybe healthline.com? This usage would reduce the badness by one further degree, but still be bad.

I could go on and on, but for this use case, unless it's some generic drug (ibuprofen or something), the only correct use case is going to the manufacturer's web site, ensuring you're looking at the exact same medication (not some newer version or a variant), and looking at the dosage guidelines.

No, not Mayo clinic or any other site (unless it's a pretty generic medicine).

This is just not a good example to highlight the problems of using an LLM. You're likely not that much worse off than using Google.

replies(1): >>45108739 #

cj ◴[02 Sep 25 20:40 UTC] No.45108739[source]▶

>>45108605 #

The compound I was researching was [edit: removed].

Problem is it's not FDA approved, only prescribed by compounding pharmacies off label. Experimental compound with no official guidelines.

The first result on Google for "[edit: removed] dosing guidelines" is a random word document hosted by a Telehealth clinic. Not exactly the most reliable source.

Edit: Jeesh, what’s with the downvotes?

replies(2): >>45109196 #>>45114986 #

1. BeetleB ◴[02 Sep 25 21:19 UTC] No.45109196[source]▶

>>45108739 #

> Experimental compound with no official guidelines.

> The first result on Google for "GHK-Cu dosing guidelines" is a random word document hosted by a Telehealth clinic. Not exactly the most reliable source.

You're making my point even more. When doing off label for an unapproved drug, you probably should not trust anything on the Internet. And if there is a reliable source out there on the Internet, it's very much on you to be able to discern what is and what is not reliable. Who cares that the LLM is wrong, when likely much of the Internet is wrong?

BTW, I'm not advocating that LLMs are good for stuff like this. But a better example would be asking the LLM "In my state, is X taxable?"

The Google AI summary was completely wrong (and the helpful link it used as a reference was correct, and in complete disagreement with the summary). But other than the AI summary being wrong, pretty much every link in the Google search results was correct. This is a good use case for not relying on an LLM: Information that is widely and easily available is wrong in the LLM.

replies(1): >>45109231 #

2. cj ◴[02 Sep 25 21:22 UTC] No.45109231[source]▶

>>45109196 (TP) #

> You're making my point even more

What exactly is your point?

Is your point that I should be smarter and shouldn’t have asked ChatGPT the question?

If that’s your point, understood, but I don’t think you can assume the average ChatGPT user will have such a discerning ability to determine when and when not using a LLM is appropriate.

FWIW I agree with you. But the “you shouldn’t ask ChatGPT that question” is a weak argument if you care about contextualizing and broadening your point beyond me and my specific anecdote.

replies(1): >>45110220 #

3. BeetleB ◴[02 Sep 25 22:59 UTC] No.45110220[source]▶

>>45109231 #

My point is that if you're trying to demonstrate how unreliable LLMs are, this is a poor example, because the alternatives are almost equally poor.

> If that’s your point, understood, but I don’t think you can assume the average ChatGPT user will have such a discerning ability to determine when and when not using a LLM is appropriate.

I agree that the average user will not, but they also will not have the ability to determine that the answer from the top (few) Google links is invalid as well. All you've shown is the LLM is as bad as Google search results.

Put another way, if you invoke this as a reason one should not rely on LLMs (in general), then it follows one should not rely on Google either (in general).

↑