/top/
/new/
/best/
/ask/
/show/
/job/
^
slacker news
login
about
←back to thread
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
(transformer-circuits.pub)
168 points
1wheel
| 1 comments |
21 May 24 15:15 UTC
|
HN request time: 0.203s
|
source
Show context
gautomdas
◴[
22 May 24 02:36 UTC
]
No.
40436795
[source]
▶
>>40429540 (OP)
#
I've really been enjoying their series on mech interp, does anyone have any other good recs?
replies(2):
>>40437371
#
>>40441436
#
1.
kromem
◴[
22 May 24 04:35 UTC
]
No.
40437371
[source]
▶
>>40436795
#
The Othello-GPT and Chess-GPT lines of work.
Was the first research work that clued me into what Anthropic's work today ended up demonstrating.
ID:
GO
↑