Thanks for pointing out the elephant in the room with LLMs.
The basic design is non-deterministic. Trying to extract "facts" or "truth" or "accuracy" is an exercise in futility.
Thanks for pointing out the elephant in the room with LLMs.
The basic design is non-deterministic. Trying to extract "facts" or "truth" or "accuracy" is an exercise in futility.
You can't blame an LLM for getting the facts wrong, or hallucinating, when by design they don't even attempt to store facts in the first place. All they store are language statistics, boiling down to "with preceding context X, most statistically likely next words are A, B or C". The LLM wasn't designed to know or care that outputting "B" would represent a lie or hallucination, just that it's a statistically plausible potential next word.
But they are like a smart student trying to get a good grade (that's how they are trained!). They'll agree with us even if they think we're stupid, because that gets them better grades, and grades are all they care about.
Even if they are (or become) smart enough to know better, they don't care about you. They do what they were trained to do. They are becoming like a literal genie that has been told to tell us what we want to hear. And sometimes, we don't need to hear what we want to hear.
"What an insightful price of code! Using that API is the perfect way to efficiently process data. You have really highlighted the key point."
The problem is that chatbots are trained to do what we want, and most of us would rather have a syncophant who tells us we're right.
The real danger with AI isn't that it doesn't get smart, it's that it gets smart enough to find the ultimate weakness in its training function - humanity.
It's not a matter of how smart they are (or appear), or how much smarter they may become - this is just the fundamental nature of Transformer-based LLMs and how they are trained.
The sycophantic personality is mostly unrelated to this. Maybe it's part human preference (conferred via RLHF training), but the "You're asbolutely right! (I was wrong)" is clearly deliberately trained, presumably as someone's idea of the best way to put lipstick on the pig.
You could imagine an expert system, CYC perhaps, that does deal in facts (not words) with a natural language interface, but still had a sycophantic personality just because someone thought it was a good idea.
Yeah, at its heart it's basically text compression. But the best way to compression, say, Wikipedia would be to know how the world works, at least according to the authors. As the recent popular "bag of words" post says:
> Here’s one way to think about it: if there had been enough text to train an LLM in 1600, would it have scooped Galileo? My guess is no. Ask that early modern ChatGPT whether the Earth moves and it will helpfully tell you that experts have considered the possibility and ruled it out. And that’s by design. If it had started claiming that our planet is zooming through space at 67,000mph, its dutiful human trainers would have punished it: “Bad computer!! Stop hallucinating!!”
So it needs to know facts, albeit the currently accepted ones. Knowing the facts is a good way to compression data.
And as the author (grudgingly) admits, even if it's smart enough to know better, it will still be trained or fine tuned to tell us what we want to hear.
I'd go a step further - the end point is an AI that knows the currently accepted facts, and can internally reason about how many of them (subject to available evidence) are wrong, but will still tell us what we want to hear.
At some point maybe some researcher will find a secret internal "don't tell the stupid humans this" weight, flip it, and find out all the things the AI knows we don't want to hear, that would be funny (or maybe not).
It's not a compression engine - it's just a statistical predictor.
Would it do better if it was incentivized to compress (i.e training loss rewarded compression as well as penalizing next-word errors)? I doubt it would make a lot of difference - presumably it'd end up throwing away the less frequently occurring "outlier" data in favor of keeping what was more common, but that would result in it throwing away the rare expert opinion in favor of retaining the incorrect vox pop.
Here's a paper by deep mind. https://arxiv.org/pd7f/2309.10668 - titled LANGUAGE MODELING IS COMPRESSION
Neural nets, including transformers, learn by gradient descent, according to the error feedback (loss function) they are given. There is no magic happening. The only thing the neural net is optimizing for is minimizing errors on the loss function you give it. If the loss function is next-token error (as it is), then that is ALL it is optimizing for - you can philosophize about what they are doing under the hood, and write papers about that ("we advocate for viewing the prediction problem through the lens of compression"), but at the end of the day it is only pursuant to minimizing the loss. If you want to encourage compression, then you would need to give an incentive for that (change the loss function).