←back to thread

A non-anthropomorphized view of LLMs

(addxorrol.blogspot.com)
475 points zdw | 1 comments | | HN request time: 0.211s | source
Show context
barrkel ◴[] No.44485012[source]
The problem with viewing LLMs as just sequence generators, and malbehaviour as bad sequences, is that it simplifies too much. LLMs have hidden state not necessarily directly reflected in the tokens being produced and it is possible for LLMs to output tokens in opposition to this hidden state to achieve longer term outcomes (or predictions, if you prefer).

Is it too anthropomorphic to say that this is a lie? To say that the hidden state and its long term predictions amount to a kind of goal? Maybe it is. But we then need a bunch of new words which have almost 1:1 correspondence to concepts from human agency and behavior to describe the processes that LLMs simulate to minimize prediction loss.

Reasoning by analogy is always shaky. It probably wouldn't be so bad to do so. But it would also amount to impenetrable jargon. It would be an uphill struggle to promulgate.

Instead, we use the anthropomorphic terminology, and then find ways to classify LLM behavior in human concept space. They are very defective humans, so it's still a bit misleading, but at least jargon is reduced.

replies(7): >>44485190 #>>44485198 #>>44485223 #>>44486284 #>>44487390 #>>44489939 #>>44490075 #
tdullien ◴[] No.44490075[source]
Author of the original article here. What hidden state are you referring to? For most LLMs the context is the state, and there is no "hidden" state. Could you explain what you mean? (Apologies if I can't see it directly)
replies(3): >>44490361 #>>44496337 #>>44504559 #
1. barrkel ◴[] No.44504559[source]
Yes, the context (along with the model weights) is the source data from which the hidden state is calculated , in an analogous way that input and CPU ticks (along with program code) is the way variables in a deterministic program get their value.

There's loads of state in the LLM that doesn't come out in the tokens it selects. The tokens are just the very top layer, and even then, you get to see just one selection from the possible tokens.

If you wish to anthropomorphize, that state - the set of activations, all the calculations that add up to the logits that determine the probability of the token to select, the whole lot of it - is what the model is "thinking". But all you get to see is one selected token.

Then, during autoregression, we run the program again, but one more tick of the CPU clock. Variables get updated a bit more. The chosen token from the previous pass conditions the next token prediction - the hidden state evolves its thinking one more step.

If you just look at the tokens being selected, you're missing this machinery. And the machinery is there. It's a program being ticked by generating tokens autoregressively. It has state which doesn't directly show up in tokens, it just informs which tokens to select. And the tokens it selects don't necessarily reflect the correspondences with perceived reality that the model is maintaining in that state. That's what I meant by talking about a lie.

We need a vocabulary to talk about this machinery. The machinery is learned, and it runs programs, effectively, that help the LLM reduce loss when predicting tokens. Since the tokens it's predicting come from human minds, the programs it's running are (broken, lossy, not very good) simulations of processes that seem to run inside human minds.

The simulations are pretty decent for producing gramatically correct text, for emulating tone and style, and so on. They're okay-ish for representing concepts. They're poor for representing very specific facts. But the overall point is they are simulations, and they have some analogous correspondence with human behavior, such that words we use to describe human behaviour are useful and practical.

They're not true, I'm not claiming that. But they're useful for talking about these weird defective minds we call LLMs.