(transformer-circuits.pub)

168 points 1wheel | 1 comments | 21 May 24 15:15 UTC | HN request time: 0.353s | source

Show context

gautomdas ◴[22 May 24 02:36 UTC] No.40436795[source]▶

I've really been enjoying their series on mech interp, does anyone have any other good recs?

1. kromem ◴[22 May 24 04:35 UTC] No.40437371[source]▶

The Othello-GPT and Chess-GPT lines of work.

Was the first research work that clued me into what Anthropic's work today ended up demonstrating.

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet