/top/
/new/
/best/
/ask/
/show/
/job/
^
slacker news
login
about
←back to thread
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
(transformer-circuits.pub)
168 points
1wheel
| 1 comments |
21 May 24 15:15 UTC
|
HN request time: 0.375s
|
source
1.
feverzsj
◴[
21 May 24 15:23 UTC
]
No.
40429632
[source]
▶
>>40429540 (OP)
#
So they made a system by trying out thousands of combinations to find the one gives best result, but they don't understand what's actually going on inside.
ID:
GO
↑