/top/
/new/
/best/
/ask/
/show/
/job/
^
slacker news
login
about
←back to thread
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
(transformer-circuits.pub)
168 points
1wheel
| 1 comments |
21 May 24 15:15 UTC
|
HN request time: 0.44s
|
source
Show context
bilsbie
◴[
21 May 24 17:46 UTC
]
No.
40431498
[source]
▶
>>40429540 (OP)
#
How are they handling attention in their approach?
That’s going to completely change what features are looked at.
replies(1):
>>40439840
#
1.
tel
◴[
22 May 24 11:49 UTC
]
No.
40439840
[source]
▶
>>40431498
#
They target the residual stream. Also they may have a definition of “feature” that’s more general than what you’re using. Consider reading their superposition work.
ID:
GO
↑