/top/
/new/
/best/
/ask/
/show/
/job/
^
slacker news
login
about
←back to thread
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
(transformer-circuits.pub)
168 points
1wheel
| 2 comments |
21 May 24 15:15 UTC
|
HN request time: 0.555s
|
source
1.
gdiamos
◴[
22 May 24 08:07 UTC
]
No.
40438502
[source]
▶
>>40429540 (OP)
#
It looks like Anthropic is now leading the charge on safety
replies(1):
>>40456745
#
ID:
GO
2.
maherbeg
◴[
23 May 24 16:30 UTC
]
No.
40456745
[source]
▶
>>40438502 (TP)
#
They always were given that is a part of their mission.
↑