←back to thread

An LLM is a lossy encyclopedia

(simonwillison.net)
509 points tosh | 2 comments | | HN request time: 0.001s | source

(the referenced HN thread starts at https://news.ycombinator.com/item?id=45060519)
Show context
quincepie ◴[] No.45101219[source]
I totally agree with the author. Sadly, I feel like that's not what the majority of LLM users tend to view LLMs. And it's definitely not what AI companies marketing.

> The key thing is to develop an intuition for questions it can usefully answer vs questions that are at a level of detail where the lossiness matters

the problem is that in order to develop an intuition for questions that LLMs can answer, the user will at least need to know something about the topic beforehand. I believe that this lack of initial understanding of the user input is what can lead to taking LLM output as factual. If one side of the exchange knows nothing about the subject, the other side can use jargon and even present random facts or lossy facts which can almost guarantee to impress the other side.

> The way to solve this particular problem is to make a correct example available to it.

My question is how much effort would it take to make a correct example available for the LLM before it can output quality and useful data? If the effort I put in is more than what I would get in return, then I feel like it's best to write and reason it myself.

replies(7): >>45102038 #>>45102286 #>>45103159 #>>45103931 #>>45104349 #>>45105150 #>>45116121 #
cj ◴[] No.45103159[source]
> the user will at least need to know something about the topic beforehand.

I used ChatGPT 5 over the weekend to double check dosing guidelines for a specific medication. "Provide dosage guidelines for medication [insert here]"

It spit back dosing guidelines that were an order of magnitude wrong (suggested 100mcg instead of 1mg). When I saw 100mcg, I was suspicious and said "I don't think that's right" and it quickly corrected itself and provided the correct dosing guidelines.

These are the kind of innocent errors that can be dangerous if users trust it blindly.

The main challenge is LLMs aren't able to gauge confidence in its answers, so it can't adjust how confidently it communicates information back to you. It's like compressing a photo, and a photographer wrongly saying "here's the best quality image I have!" - do you trust the photographer at their word, or do you challenge him to find a better quality image?

replies(12): >>45103322 #>>45103346 #>>45103459 #>>45103642 #>>45106112 #>>45106634 #>>45108321 #>>45108605 #>>45109136 #>>45110008 #>>45110773 #>>45112140 #
zehaeva ◴[] No.45103642[source]
What if you had told it again that you don't think that's right? Would it have stuck to it's guns and went "oh, no, I am right here" or would it have backed down and said "Oh, silly me, you're right, here's the real dosage!" and give you again something wrong?

I do agree that to get the full usage out of an LLM you should have some familiarity with what you're asking about. If you didn't already have a sense of what a dosage is already, why wouldn't 100mcg be the right one?

replies(1): >>45103827 #
1. cj ◴[] No.45103827[source]
I replied in the same thread "Are you sure that sounds like a low dose". It stuck to the (correct) recommendation in the 2nd response, but added in a few use cases for higher doses. So seems like it stuck to its guns for the most part.

For things like this, it would definitely be better for it to act more like a search engine and direct me to trustworthy sources for the information rather than try to provide the information directly.

replies(1): >>45108294 #
2. stevedotcarter ◴[] No.45108294[source]
I noticed this recently when I saw someone post with an AI generated map of Europe which was all wrong. I tried the same and asked ChatGPT to generate a map of Ireland and it was wrong too. So then I asked to find me some accurate maps of Ireland and instead of generating it gave me images and links to proper websites.

Will definitely be remembering to put "generate" vs "find" in my prompts depending on what I'm looking for. Not quite sure how you would train the model to know which answer is more suitable.