←back to thread

A non-anthropomorphized view of LLMs

(addxorrol.blogspot.com)
475 points zdw | 1 comments | | HN request time: 0.208s | source
Show context
barrkel ◴[] No.44485012[source]
The problem with viewing LLMs as just sequence generators, and malbehaviour as bad sequences, is that it simplifies too much. LLMs have hidden state not necessarily directly reflected in the tokens being produced and it is possible for LLMs to output tokens in opposition to this hidden state to achieve longer term outcomes (or predictions, if you prefer).

Is it too anthropomorphic to say that this is a lie? To say that the hidden state and its long term predictions amount to a kind of goal? Maybe it is. But we then need a bunch of new words which have almost 1:1 correspondence to concepts from human agency and behavior to describe the processes that LLMs simulate to minimize prediction loss.

Reasoning by analogy is always shaky. It probably wouldn't be so bad to do so. But it would also amount to impenetrable jargon. It would be an uphill struggle to promulgate.

Instead, we use the anthropomorphic terminology, and then find ways to classify LLM behavior in human concept space. They are very defective humans, so it's still a bit misleading, but at least jargon is reduced.

replies(7): >>44485190 #>>44485198 #>>44485223 #>>44486284 #>>44487390 #>>44489939 #>>44490075 #
gugagore ◴[] No.44485190[source]
I'm not sure what you mean by "hidden state". If you set aside chain of thought, memories, system prompts, etc. and the interfaces that don't show them, there is no hidden state.

These LLMs are almost always, to my knowledge, autoregressive models, not recurrent models (Mamba is a notable exception).

replies(3): >>44485271 #>>44485298 #>>44485311 #
barrkel ◴[] No.44485271[source]
Hidden state in the form of the activation heads, intermediate activations and so on. Logically, in autoregression these are recalculated every time you run the sequence to predict the next token. The point is, the entire NN state isn't output for each token. There is lots of hidden state that goes into selecting that token and the token isn't a full representation of that information.
replies(2): >>44485334 #>>44485360 #
gugagore ◴[] No.44485334[source]
That's not what "state" means, typically. The "state of mind" you're in affects the words you say in response to something.

Intermediate activations isn't "state". The tokens that have already been generated, along with the fixed weights, is the only data that affects the next tokens.

replies(2): >>44485915 #>>44488490 #
barrkel ◴[] No.44488490[source]
Sure it's state. It logically evolves stepwise per token generation. It encapsulates the LLM's understanding of the text so far so it can predict the next token. That it is merely a fixed function of other data isn't interesting or useful to say.

All deterministic programs are fixed functions of program code, inputs and computation steps, but we don't say that they don't have state. It's not a useful distinction for communicating among humans.

replies(1): >>44488841 #
gugagore ◴[] No.44488841[source]
I'll say it once more: I think it is useful to distinguish between autoregressive and recurrent architectures. A clear way to make that distinction is to agree that the recurrent architecture has hidden state, while the autoregressive one does not. A recurrent model has some point in a space that "encapsulates its understanding". This space is "hidden" in the sense that it doesn't correspond to text tokens or any other output. This space is "state" in the sense that it is sufficient to summarize the history of the inputs for the sake of predicting the next output.

When you use "hidden state" the way you are using it, I am left wondering how you make a distinction between autoregressive and recurrent architectures.

replies(2): >>44488974 #>>44489098 #
gugagore ◴[] No.44489098[source]
I'll also point out what is most important part from your original message:

> LLMs have hidden state not necessarily directly reflected in the tokens being produced, and it is possible for LLMs to output tokens in opposition to this hidden state to achieve longer-term outcomes (or predictions, if you prefer).

But what does it mean for an LLM to output a token in opposition to its hidden state? If there's a longer-term goal, it either needs to be verbalized in the output stream, or somehow reconstructed from the prompt on each token.

There’s some work (a link would be great) that disentangles whether chain-of-thought helps because it gives the model more FLOPs to process, or because it makes its subgoals explicit—e.g., by outputting “Okay, let’s reason through this step by step...” versus just "...." What they find is that even placeholder tokens like "..." can help.

That seems to imply some notion of evolving hidden state! I see how that comes in!

But crucially, in autoregressive models, this state isn’t persisted across time. Each token is generated afresh, based only on the visible history. The model’s internal (hidden) layers are certainly rich and structured and "non verbal".

But any nefarious intention or conclusion has to be arrived at on every forward pass.

replies(2): >>44489774 #>>44504493 #
1. barrkel ◴[] No.44504493[source]
The LLM can be predict that it may lie, and when it sees tokens which are contrary to some correspondence with reality as it "understands" it, it may predict that the lie continues. It doesn't necessarily need to predict that it will reveal the lie. You can, after all, stop autoregressively producing tokens at any point, and the LLM may elect to produce an end of sequence token without revealing the lie.

Goals, such as they are, are essentially programs, or simulations, the LLM runs that help it predict (generate) future tokens.

Anyway, the whole original article is a rejection of anthropomorphism. I think the anthropomorphism is useful, but you still need to think of LLMs as deeply defective minds. And I totally reject the idea that they have intrinsic moral weight or consciousness or anything close to that.