(transformer-circuits.pub)

168 points 1wheel | 1 comments | 21 May 24 15:15 UTC | HN request time: 0s | source

1. null_point ◴[21 May 24 15:31 UTC] No.40429743[source]▶

Strategic timing for the release of this paper. As of last week OpenAI looks weak in their commitment to _AI Safety_, losing key members of their Super Alignment team.

↑

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet