(transformer-circuits.pub)

168 points 1wheel | 1 comments | 21 May 24 15:15 UTC | HN request time: 0s | source

Show context

wwarner ◴[21 May 24 15:38 UTC] No.40429827[source]▶

huge. the activation scan, which looks for which nodes change the most when prompted with the words "Golden Gate Bridge" and later an image of the same bridge, is eerily reminiscent of a brain scan under similar prompts...

replies(2): >>40429890 #>>40429981 #

1. ◴[21 May 24 15:47 UTC] No.40429981[source]▶

>>40429827 #

↑

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet