A non-anthropomorphized view of LLMs

(addxorrol.blogspot.com)

477 points zdw | 5 comments | 06 Jul 25 22:26 UTC | HN request time: 0.625s | source

Show context

barrkel ◴[06 Jul 25 23:14 UTC] No.44485012[source]▶

The problem with viewing LLMs as just sequence generators, and malbehaviour as bad sequences, is that it simplifies too much. LLMs have hidden state not necessarily directly reflected in the tokens being produced and it is possible for LLMs to output tokens in opposition to this hidden state to achieve longer term outcomes (or predictions, if you prefer).

Is it too anthropomorphic to say that this is a lie? To say that the hidden state and its long term predictions amount to a kind of goal? Maybe it is. But we then need a bunch of new words which have almost 1:1 correspondence to concepts from human agency and behavior to describe the processes that LLMs simulate to minimize prediction loss.

Reasoning by analogy is always shaky. It probably wouldn't be so bad to do so. But it would also amount to impenetrable jargon. It would be an uphill struggle to promulgate.

Instead, we use the anthropomorphic terminology, and then find ways to classify LLM behavior in human concept space. They are very defective humans, so it's still a bit misleading, but at least jargon is reduced.

replies(7): >>44485190 #>>44485198 #>>44485223 #>>44486284 #>>44487390 #>>44489939 #>>44490075 #

tdullien ◴[07 Jul 25 13:15 UTC] No.44490075[source]▶

>>44485012 #

Author of the original article here. What hidden state are you referring to? For most LLMs the context is the state, and there is no "hidden" state. Could you explain what you mean? (Apologies if I can't see it directly)

replies(3): >>44490361 #>>44496337 #>>44504559 #

1. lukeschlather ◴[07 Jul 25 13:49 UTC] No.44490361[source]▶

>>44490075 #

Yes, strictly speaking, the model itself is stateless, but there are 600B parameters of state machine for frontier models that define which token to pick next. And that state machine is both incomprehensibly large and also of a similar magnitude in size to a human brain. (Probably, I'll grant it's possible it's smaller, but it's still quite large.)

I think my issue with the "don't anthropomorphize" is that it's unclear to me that the main difference between a human and an LLM isn't simply the inability for the LLM to rewrite its own model weights on the fly. (And I say "simply" but there's obviously nothing simple about it, and it might be possible already with current hardware, we just don't know how to do it.)

Even if we decide it is clearly different, this is still an incredibly large and dynamic system. "Stateless" or not, there's an incredible amount of state that is not comprehensible to me.

replies(3): >>44490546 #>>44490762 #>>44491161 #

2. tdullien ◴[07 Jul 25 14:09 UTC] No.44490546[source]▶

>>44490361 (TP) #

Fair, there is a lot that is incomprehensible to all of us. I wouldn't call it "state" as it's fixed, but that is a rather subtle point.

That said, would you anthropomorphize a meteorological simulation just because it contains lots and lots of constants that you don't understand well?

I'm pretty sure that recurrent dynamical systems pretty quickly become universal computers, but we are treating those that generate human language differently from others, and I don't quite see the difference.

replies(1): >>44495813 #

3. jazzyjackson ◴[07 Jul 25 14:30 UTC] No.44490762[source]▶

>>44490361 (TP) #

FWIW the number of parameters in a LLM is in the same ballpark as the number of nuerons in a human (roughly 80B) but neurons are not weights, they are kind of a nueral net unto themselves, stateful, adaptive, self modifying, a good variety of neurotransmitters (and their chemical analogs) aside from just voltage.

It's fun to think about just how fantastic a brain is, and how much wattage and data-center-scale we're throwing around trying to approximate its behavior. Mega-effecient and mega-dense. I'm bearish on AGI simply from an internetworking standpoint, the speed of light is hard to beat and until you can fit 80 billion interconnected cores in half a cubic foot you're just not going to get close to the responsiveness of reacting to the world in real time as biology manages to do. but that's a whole nother matter. I just wanted to pick apart that magnitude of parameters is not an altogether meaningful comparison :)

4. jibal ◴[07 Jul 25 15:06 UTC] No.44491161[source]▶

>>44490361 (TP) #

> it's unclear to me that the main difference between a human and an LLM isn't simply the inability for the LLM to rewrite its own model weights on the fly.

This is "simply" an acknowledgement of extreme ignorance of how human brains work.

5. lukeschlather ◴[08 Jul 25 00:16 UTC] No.44495813[source]▶

>>44490546 #

Meteorological simulations don't contain detailed state machines that are intended to encode how a human would behave in a specific situation.

And if it were just language, I would say, sure maybe this is more limited. But it seems like tensors can do a lot more than that. Poorly, but that may primarily be a hardware limitation. It also might be something about the way they work, but not something terribly different from what they are doing.

Also, I might talk about a meteorological simulation in terms of whatever it was intended to simulate.

↑