←back to thread

An LLM is a lossy encyclopedia

(simonwillison.net)
509 points tosh | 3 comments | | HN request time: 0s | source

(the referenced HN thread starts at https://news.ycombinator.com/item?id=45060519)
Show context
quincepie ◴[] No.45101219[source]
I totally agree with the author. Sadly, I feel like that's not what the majority of LLM users tend to view LLMs. And it's definitely not what AI companies marketing.

> The key thing is to develop an intuition for questions it can usefully answer vs questions that are at a level of detail where the lossiness matters

the problem is that in order to develop an intuition for questions that LLMs can answer, the user will at least need to know something about the topic beforehand. I believe that this lack of initial understanding of the user input is what can lead to taking LLM output as factual. If one side of the exchange knows nothing about the subject, the other side can use jargon and even present random facts or lossy facts which can almost guarantee to impress the other side.

> The way to solve this particular problem is to make a correct example available to it.

My question is how much effort would it take to make a correct example available for the LLM before it can output quality and useful data? If the effort I put in is more than what I would get in return, then I feel like it's best to write and reason it myself.

replies(7): >>45102038 #>>45102286 #>>45103159 #>>45103931 #>>45104349 #>>45105150 #>>45116121 #
cj ◴[] No.45103159[source]
> the user will at least need to know something about the topic beforehand.

I used ChatGPT 5 over the weekend to double check dosing guidelines for a specific medication. "Provide dosage guidelines for medication [insert here]"

It spit back dosing guidelines that were an order of magnitude wrong (suggested 100mcg instead of 1mg). When I saw 100mcg, I was suspicious and said "I don't think that's right" and it quickly corrected itself and provided the correct dosing guidelines.

These are the kind of innocent errors that can be dangerous if users trust it blindly.

The main challenge is LLMs aren't able to gauge confidence in its answers, so it can't adjust how confidently it communicates information back to you. It's like compressing a photo, and a photographer wrongly saying "here's the best quality image I have!" - do you trust the photographer at their word, or do you challenge him to find a better quality image?

replies(12): >>45103322 #>>45103346 #>>45103459 #>>45103642 #>>45106112 #>>45106634 #>>45108321 #>>45108605 #>>45109136 #>>45110008 #>>45110773 #>>45112140 #
BeetleB ◴[] No.45108605[source]
> I used ChatGPT 5 over the weekend to double check dosing guidelines for a specific medication.

This use case is bad by several degrees.

Consider an alternative: Using Google to search for it and relying on its AI generated answer. This usage would be bad by one degree less, but still bad.

What about using Google and clicking on one of the top results? Maybe healthline.com? This usage would reduce the badness by one further degree, but still be bad.

I could go on and on, but for this use case, unless it's some generic drug (ibuprofen or something), the only correct use case is going to the manufacturer's web site, ensuring you're looking at the exact same medication (not some newer version or a variant), and looking at the dosage guidelines.

No, not Mayo clinic or any other site (unless it's a pretty generic medicine).

This is just not a good example to highlight the problems of using an LLM. You're likely not that much worse off than using Google.

replies(1): >>45108739 #
cj ◴[] No.45108739[source]
The compound I was researching was [edit: removed].

Problem is it's not FDA approved, only prescribed by compounding pharmacies off label. Experimental compound with no official guidelines.

The first result on Google for "[edit: removed] dosing guidelines" is a random word document hosted by a Telehealth clinic. Not exactly the most reliable source.

Edit: Jeesh, what’s with the downvotes?

replies(2): >>45109196 #>>45114986 #
nonameiguess ◴[] No.45114986{4}[source]
I think this actually points at a different problem, a problem with LLM users, but only to the extent that it's a problem with people with respect to any questions they have to ask any source they consider an authority at all. No LLM, nor any other source on the Internet, nor any other source off the Internet, can give you reliable dosage guidelines for copper peptides because this is information that is not known to humans. There is some answer to the question of what response you might expect and how that varies by dose, but without the clinical trials ever having been conducted, it's not an answer anyone actually has. Marketing and popular misconceptions about AI lead to people expecting it to be able to conjure facts out of thin air, perhaps reasoning from first principles using its highly honed model of human physiology.

It's an uncomfortable position to be in trying to biohack your way to a more youthful appearance using treatments that have never been studied in human trials, but that's the reality you're facing. Whatever guidelines you manage to find, whether from the telehealth clinic directly, or from a language model that read the Internet and ingested that along with maybe a few other sources, are generally extrapolated from early rodent studies and all that's being extrapolated is an allometric scaling from rat body to human body of the dosage the researchers actually gave to the rats. What effect that actually had, and how that may or may not translate to humans, is not usually a part of the consideration. To at least some extent, it can't be if the compound was never trialed on humans.

You're basically just going with scale up a dosage to human sized that at least didn't kill the rats. Take that and it probably won't kill you. What it might actually do can't be answered, not by doctors, not by an LLM, not by Wikipedia, not by anecdotes from past biohackers who tried it on themselves. This is not a failure of information retrieval or compression. You're just asking for information that is not known to anyone, so no one can give it to you.

If there's a problem here specific to LLMs, it's that they'll generally give you an answer anyway and will not in any way quantify the extent to which it is probably bullshit and why.

replies(1): >>45115509 #
cj ◴[] No.45115509{5}[source]
> a problem with LLM users

I think the flaw here is placing blame on users rather than the service provider.

HN is cutting LLM companies slack because we understand the technical limitations making it hard for the LLM to just say “I don’t know”.

In any other universe, we would be blaming the service rather than the user.

Why don’t we fix LLMs so they don’t spit out garbage when it doesn’t know the answer. Have we given up on that thought?

replies(2): >>45115686 #>>45117082 #
1. BeetleB ◴[] No.45117082{6}[source]
> In any other universe, we would be blaming the service rather than the user.

I think the key question is "How is this service being advertised?"

Perhaps the HN crowd gives it a lot of slack because they ignore the advertising. Or if you're like me, aren't even aware of how this is being marketed. We know the limitations, and adapt appropriately.

I guess where we differ is on whether the tool is broken or not (hence your use of the word "fix"). For me, it's not at all broken. What may be broken is the messaging. I don't want them to modify the tool to say "I don't know", because I'm fairly sure if they do that, it will break a number of people's use cases. If they want to put a post-processor that filters stuff before it gets to the user, and give me an option to disable the post-processor, then I'm fine with it. But don't handicap the tool in the name of accuracy!

replies(1): >>45118595 #
2. cj ◴[] No.45118595[source]
The point you were making elsewhere in the thread was that "this is a bad use case for LLMs" ... "Don't use LLMs for dosing guidelines." ... "Using dosing guidelines is a bad example for demonstrating how reliable or unreliable LLMs are", etc etc etc.

You're blaming the user for having a bad experience as a result of not using the service "correctly".

I think the tool is absolutely broken, considering all of the people saying dosing guidelines is an "incorrect" use of LLM models. (While I agree it's not a good use, I strongly dislike how you're blaming the user for using it incorrectly - completely out of touch with reality).

We can't just cover up the shortfalls of LLMs by saying things like "Oh sorry, that's not a good use case, you're stupid if you use the tool for that purpose".

I really hope the HN crowd stops making excuses for why it's okay that LLMs don't perform well on tasks it's commonly asked to do.

> But don't handicap the tool in the name of accuracy!

If you're taking the position that it's the user's fault for asking LLMs a question it won't be good at answering, then you can't simultaneously advocate for not censoring the model. If it's the user's responsibility to know how to use ChatGPT "correctly", the tool (at a minimum) should help guide you away from using it in ways it's not intended for.

If LLMs were only used by smarter-than-average HN-crowd techies, I'd agree. But we're talking about a technology used by middle school kids. I don't think it's reasonable to expect middleschoolers to know what they should and shouldn't ask LLMs for help with.

replies(1): >>45130660 #
3. BeetleB ◴[] No.45130660[source]
> You're blaming the user for having a bad experience as a result of not using the service "correctly".

Definitely. Just as I used to blame people for misusing search engines in the pre-LLM era. Or for using Wikipedia to get non-factual information. Or for using a library as a place to meet with friends and have lunch (in a non-private area).

If you're going to try to use a knife as a hammer, yes, I will fault you.

I do expect that if someone plans to use a tool, they do own the responsibility of learning how to use it.

> If you're taking the position that it's the user's fault for asking LLMs a question it won't be good at answering, then you can't simultaneously advocate for not censoring the model. If it's the user's responsibility to know how to use ChatGPT "correctly", the tool (at a minimum) should help guide you away from using it in ways it's not intended for.

Documentation, manuals, training videos, etc.

Yes, I am perhaps a greybeard. And while I do like that many modern parts of computing are designed to be easy to use without any training, I am against stating that this is a minimum standard that all tools have to meet.

Software is the only part of engineering where "self-explanatory" seems to be common. You don't buy a board game hoping it will just be self-evident how to play. You don't buy a pressure cooker hoping it will just be safe to use without learning how to use it.

So yes, I do expect users should learn how to use the tools they use.