/top/
/new/
/best/
/ask/
/show/
/job/
^
slacker news
login
about
←back to thread
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
(transformer-circuits.pub)
168 points
1wheel
| 1 comments |
21 May 24 15:15 UTC
|
HN request time: 0.406s
|
source
Show context
pagekicker
◴[
21 May 24 21:54 UTC
]
No.
40434535
[source]
▶
>>40429540 (OP)
#
The article doesn't explain how users can exploit these features in UI or prompt. Does anyone have any insight on how to do so?
replies(1):
>>40435715
#
1.
CephalopodMD
◴[
21 May 24 23:52 UTC
]
No.
40435715
[source]
▶
>>40434535
#
They explicitly aren't releasing any tools to do this with their models for safety reasons. But you could probably do it from scratch with one of the open models by following their methodology.
ID:
GO
↑