←back to thread

168 points 1wheel | 3 comments | | HN request time: 0.015s | source
Show context
kromem ◴[] No.40437406[source]
Great work as usual.

I was pretty upset seeing the superalignment team dissolve at OpenAI, but as is typical for the AI space, the news of one day was quickly eclipsed by the next day.

Anthropic are really killing it right now, and it's very refreshing seeing their commitment to publishing novel findings.

I hope this finally serves as the nail in the coffin on the "it's just fancy autocomplete" and "it doesn't understand what it's saying, bro" rhetoric.

replies(3): >>40437912 #>>40441380 #>>40445446 #
jwilber ◴[] No.40437912[source]
Love Anthropic research. Great visuals between Olah, Carter, and Pearce, as well.

I don’t think this paper does much in the way of your final point, “it doesn’t understand what it’s saying”, though our understanding certainly has improved.

replies(1): >>40438375 #
1. kromem ◴[] No.40438375[source]
They were able to demonstrate conceptual vectors that were consistent across different languages and different mediums (text vs images) and that when manipulated were able to represent the abstract concept in the output regardless of prompt.

What kind of evidentiary threshold would you want if that's not sufficient?

replies(1): >>40444944 #
2. jwilber ◴[] No.40444944[source]
My point is that you claimed this is a rebuff against those claiming models don’t understand themselves. Your interpretation seems to assign intelligence to the algorithms.

While this research allows us to interpret larger models in an amazing way, it doesn’t mean the models themselves ‘understand’ anything.

You can use this on much smaller scale models as well, as they showed 8 months ago. Does that research tell us about how models understand themselves? Or does it help us understand how the models work?

replies(1): >>40446963 #
3. kromem ◴[] No.40446963[source]
"Understand themselves" is a very different thing than "understand what they are saying."

Which exactly are we talking about here?

Because no, the research doesn't say much about the former, but yes, it says a lot about the latter, especially on top of the many, many earlier papers working in smaller toy models demonstrating world modeling.