/top/
/new/
/best/
/ask/
/show/
/job/
^
slacker news
login
about
←back to thread
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
(transformer-circuits.pub)
168 points
1wheel
| 1 comments |
21 May 24 15:15 UTC
|
HN request time: 0.222s
|
source
Show context
wwarner
◴[
21 May 24 15:38 UTC
]
No.
40429827
[source]
▶
>>40429540 (OP)
#
huge. the activation scan, which looks for which nodes change the most when prompted with the words "Golden Gate Bridge" and later an image of the same bridge, is eerily reminiscent of a brain scan under similar prompts...
replies(2):
>>40429890
#
>>40429981
#
1.
◴[
21 May 24 15:47 UTC
]
No.
40429981
[source]
▶
>>40429827
#
ID:
GO
↑