Most active commenters

whimsicalism(9)
HarHarVeryFunny(7)
verdverm(6)
byteknight(3)

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

(transformer-circuits.pub)

1. optimalsolver ◴[21 May 24 15:10 UTC] No.40429473[source]▶

>what the model is "thinking" before writing its response

An actual "thinking machine" would be constantly running computations on its accumulated experience in order to improve its future output and/or further compress its sensory history.

An LLM is doing exactly nothing while waiting for the next prompt.

replies(5): >>40429486 #>>40429493 #>>40429606 #>>40429761 #>>40429847 #

2. fassssst ◴[21 May 24 15:12 UTC] No.40429486[source]▶

>>40429473 (TP) #

Why does the timing of the “thinking” matter?

replies(1): >>40429660 #

3. byteknight ◴[21 May 24 15:12 UTC] No.40429493[source]▶

>>40429473 (TP) #

I disagree with this. That suggests that thinking requires persistent, malleable and non-static memory. That is not the case. You can reasonably reason about without increasing knowledge if you have a base set of logic.

I think the thing you were looking for was more along the lines of a persistent autonomous agent.

replies(2): >>40429744 #>>40430109 #

4. HarHarVeryFunny ◴[21 May 24 15:21 UTC] No.40429606[source]▶

>>40429473 (TP) #

Right, it's not doing anything between prompts, but each prompt is fed through each of the transformer layers (I think it was 96 layers for GPT-3) in turn, so we can think of this as a fixed N-steps of "thought" (analyzing prompt in hierarchical fashion) to generate each token.

5. verdverm ◴[21 May 24 15:26 UTC] No.40429660[source]▶

>>40429486 #

thinking is generally considered an internal process, without input/output (of tokens), though some people decide to output some of that thinking into a more permanent form

I see thinking as less about "timing" and more about a "process"

What this post seems to be describing is more about where attention is paid and what neurons fire for various stimuli

replies(1): >>40429695 #

6. sabrina_ramonov ◴[21 May 24 15:29 UTC] No.40429695{3}[source]▶

>>40429660 #

we know so little about thinking and consciousness, these claims seem premature

replies(1): >>40429816 #

7. soma8088 ◴[21 May 24 15:32 UTC] No.40429744[source]▶

>>40429493 #

LLMs can reasonably reason, however they differ in that once an output begins to be generated, it must continue along the same base set of logic. Correct me if I'm wrong, but I do not believe it can stop and think to itself that there is something wrong with the output and that it should start over at the beginning or backup to a previous state before it outputted something incorrect. Once its output begins to hallucinate it has no choice but continue down the same path since its next token is also based on previous tokens it has just outputted

8. whimsicalism ◴[21 May 24 15:33 UTC] No.40429761[source]▶

>>40429473 (TP) #

If we figured out how to freeze and then revive brains, would that mean that all of the revived brains were no longer thinking because they had previously been paused at some point?

Frankly this objection seems very weak

replies(2): >>40429955 #>>40430180 #

9. verdverm ◴[21 May 24 15:37 UTC] No.40429816{4}[source]▶

>>40429695 #

That one can fix the RNG and get consistent output indicates a lack of dynamics

They certainly do not self update the weights in an online process as needed information is experienced

replies(1): >>40429919 #

10. viking123 ◴[21 May 24 15:39 UTC] No.40429847[source]▶

>>40429473 (TP) #

I might be a complete brainlet so excuse my take, but when animals think and do things, the weights in the brain are constantly being adjusted, old connections pruned out and new ones made right? But once LLM is trained, that's kind of it? Nothing there changes when we discuss with it. As far as I understand from what I read, even our memories are just somehow in the connections between the neurons

replies(1): >>40429888 #

11. whimsicalism ◴[21 May 24 15:41 UTC] No.40429888[source]▶

>>40429847 #

my understanding was that once you are of age, brain pruning and malleability is relatively small

12. whimsicalism ◴[21 May 24 15:43 UTC] No.40429919{5}[source]▶

>>40429816 #

If we could perfectly simulate the brain and there were quantum hidden variables, we too could “fix RNG and get deterministic output”

13. verdverm ◴[21 May 24 15:45 UTC] No.40429955[source]▶

>>40429761 #

There are many more features that would be needed, such as a peer comment pointed out, being able to recognize you are saying something incorrect, pausing, and then starting a new stream of output.

This is currently done with multiple LLMs and calls, not within the running of a single model i/o

Another example would be to input a single token or gibberish, the models we have today are more than happy to spit out fantastic numbers of tokens. They really only stop because we look for stop words they are trained to generate and we do the actual stopping action

replies(1): >>40430015 #

14. whimsicalism ◴[21 May 24 15:49 UTC] No.40430015{3}[source]▶

>>40429955 #

i don’t see why any of the things you’re describing are criteria for thinking, it seems just arbitrarily picking things humans do and saying this is somehow constitutive to thought

replies(1): >>40430054 #

15. verdverm ◴[21 May 24 15:52 UTC] No.40430054{4}[source]▶

>>40430015 #

It's more to point out how far the LLMs we have today are from anything that ought to be considered thoughts. They are far more mechanical than anything else

replies(1): >>40430104 #

16. whimsicalism ◴[21 May 24 15:54 UTC] No.40430104{5}[source]▶

>>40430054 #

you’re just retreating into tautologies - my question was why these are the criteria for thought

it’s fine though, this was as productive as i expected

replies(1): >>40430217 #

17. HarHarVeryFunny ◴[21 May 24 15:55 UTC] No.40430109[source]▶

>>40429493 #

Sure you can reason over a fixed "base set of logic", although there's another word for that - an expert system with a fixed set of rules, which IMO is really the right way to view an LLM.

Still, what current LLMs are doing with their fixed rules is only a very limited form of reasoning since they just use a fixed N-steps of rule application to generate each word. People are looking to techniques such "group of experts" prompting to improve reasoning - step-wise generate multiple responses then evaluate them and proceed to next step.

replies(1): >>40430171 #

18. whimsicalism ◴[21 May 24 15:59 UTC] No.40430171{3}[source]▶

>>40430109 #

if you zoom in enough, all thinking is an expert system with a fixed set of rules.

replies(2): >>40430181 #>>40431190 #

19. abecedarius ◴[21 May 24 15:59 UTC] No.40430180[source]▶

>>40429761 #

Yeah. I've had like three conversations with people who said LLMs don't "think", implied this was too obvious to need to say why, and when pressed on it brought up the pausing as their first justification.

It's an interesting window on people's intuitions -- this pattern felt surprising and alien now to someone who imbibed Hofstadter and Dennett, etc., as a teen in the 80s.

(TBC, the surprise was not that people weren't sure they "think" or are "conscious", it's that they were sure they aren't, on this basis that the program is not running continually.)

20. byteknight ◴[21 May 24 15:59 UTC] No.40430181{4}[source]▶

>>40430171 #

Exactly. You can't reason with that you do not currently posses.

replies(2): >>40430491 #>>40431301 #

21. verdverm ◴[21 May 24 16:02 UTC] No.40430217{6}[source]▶

>>40430104 #

I'm not listing criteria for thought

I'm listing things that current LLMs cannot do (or things they do that thinking entities would not) to argue they are so simple they are far from anything that resembles thinking

> it’s fine though, this was as productive as i expected

A product of your replies becoming lowering in quality, and becoming more argumentative, so I will discontinue now

22. verdverm ◴[21 May 24 16:21 UTC] No.40430491{5}[source]▶

>>40430181 #

How does scientific progress happen without reasoning about that which we do not know or understand?

replies(1): >>40432026 #

23. HarHarVeryFunny ◴[21 May 24 17:18 UTC] No.40431190{4}[source]▶

>>40430171 #

That the basis of it, but in our brain the "inference engine" using those rules is a lot more than a fixed N-steps - there is thalamo-cortical looping, working memory of various durations, and maybe a bunch of other mechanisms such as analogical recall, resonance-based winner-takes-all processing, etc, etc.

Current LLMs have none of that - they are just the fixed set of rules, further limited by also having a fixed number of steps of rule application.

replies(1): >>40431884 #

24. HarHarVeryFunny ◴[21 May 24 17:26 UTC] No.40431301{5}[source]▶

>>40430181 #

Sure, but you (a person, not an LLM) can also reason about what you don't possess, which is one of our primary learning mechanisms - curiosity driven by lack of knowledge causing us to explore and acquire new knowledge by physical and/or mental exploration.

An LLM has no innate traits such as curiosity or boredom to trigger exploration, and anyways no online/incremental learning mechanism to benefit from it even if it did.

25. whimsicalism ◴[21 May 24 18:18 UTC] No.40431884{5}[source]▶

>>40431190 #

Yes, LLMs don't have regression and that is a significant limitation - although they do have something close, by decoding one token they get to then have a thought loop. They just can't loop without outputting.

replies(1): >>40432360 #

26. byteknight ◴[21 May 24 18:29 UTC] No.40432026{6}[source]▶

>>40430491 #

That's building upon current knowledge. That is a different application.

27. HarHarVeryFunny ◴[21 May 24 18:54 UTC] No.40432360{6}[source]▶

>>40431884 #

Well, not exactly a loop. They get to "extend the thought", but there is zero continuity from one word to the next (LLM starts from scratch for each token generated).

The effect is as if you had multiple people playing a game where they each extend a sentence by taking turns adding a word to it, but there is zero continuity from one word to the next because each person is starting from scratch when it is their turn.

replies(1): >>40433168 #

28. whimsicalism ◴[21 May 24 20:03 UTC] No.40433168{7}[source]▶

>>40432360 #

> LLM starts from scratch for each token generated

What do you mean? They get to access their previous hidden states in the next greedy decode using attention, it is not simply starting from scratch. They can access exactly what they were thinking when they put out the previous word, not just reasoning from the word itself.

replies(1): >>40434011 #

29. HarHarVeryFunny ◴[21 May 24 21:12 UTC] No.40434011{8}[source]▶

>>40433168 #

There's the KV cache kept from one word to the next, but isn't that just an optimization ?

replies(1): >>40434448 #

30. whimsicalism ◴[21 May 24 21:46 UTC] No.40434448{9}[source]▶

>>40434011 #

Yes, the 'KV cache' (imo an invented novelty, everyone was doing this before they came up with a term to make it sound cool) is an optimization so that you don't have to recompute what the model was thinking when it was generating all the prior words every time you decode a new word.

But that's exactly what I'm saying - the model has access to what it was thinking when it generated the previous words, it does not start from scratch. If you don't have the KV cache, you still have to regenerate what it was thinking from the previous words so on the next word generation you can look back at what you were thinking from the previous words. Does that make sense? I'm not great at talking about this stuff in words

replies(1): >>40435222 #

31. HarHarVeryFunny ◴[21 May 24 23:05 UTC] No.40435222{10}[source]▶

>>40434448 #

I don't think you can really say it "regenerates" what it was thinking from the last prompt, since the new prompt is different from the previous one (it has the new word appended to the end, which may change the potential meanings of the sentence).

There will be some overlap in what the model is now "thinking" (and has calculated from scratch) since the new prompt is one possible continuation of the previous one, but other things it was previously "thinking" will no longer be there.

e.g. Say the prompt was "the man", and output probabilities include "in" and "ran", reflecting the model thinking of potential continuations such as "the man in the corner" and "the man ran for mayor". Suppose the word sampled was "ran", so now the new prompt is "the man ran". Possible continuations can no longer include refining who the subject is, since the new word "ran" implies the continuation must now be an action.

There is some work that has been saved, per the KV cache, in processing the new prompt, but that is only things (self attention among the common part of the two prompts) that would not change if recalculated. What the model is thinking has changed, and will continue to change depending on the next sampled continuation ("the man ran for mayor", "the man ran for cover", "the man ran his bath", etc).

↑