/top/
/new/
/best/
/ask/
/show/
/job/
^
slacker news
login
about
←back to thread
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
(transformer-circuits.pub)
168 points
1wheel
| 1 comments |
21 May 24 15:15 UTC
|
HN request time: 1.041s
|
source
1.
null_point
◴[
21 May 24 15:31 UTC
]
No.
40429743
[source]
▶
>>40429540 (OP)
#
Strategic timing for the release of this paper. As of last week OpenAI looks weak in their commitment to _AI Safety_, losing key members of their Super Alignment team.
ID:
GO
↑