Most active commenters
  • ben_w(4)
  • kromem(3)
  • astrange(3)

←back to thread

168 points 1wheel | 13 comments | | HN request time: 3.271s | source | bottom
1. kromem ◴[] No.40437406[source]
Great work as usual.

I was pretty upset seeing the superalignment team dissolve at OpenAI, but as is typical for the AI space, the news of one day was quickly eclipsed by the next day.

Anthropic are really killing it right now, and it's very refreshing seeing their commitment to publishing novel findings.

I hope this finally serves as the nail in the coffin on the "it's just fancy autocomplete" and "it doesn't understand what it's saying, bro" rhetoric.

replies(3): >>40437912 #>>40441380 #>>40445446 #
2. jwilber ◴[] No.40437912[source]
Love Anthropic research. Great visuals between Olah, Carter, and Pearce, as well.

I don’t think this paper does much in the way of your final point, “it doesn’t understand what it’s saying”, though our understanding certainly has improved.

replies(1): >>40438375 #
3. kromem ◴[] No.40438375[source]
They were able to demonstrate conceptual vectors that were consistent across different languages and different mediums (text vs images) and that when manipulated were able to represent the abstract concept in the output regardless of prompt.

What kind of evidentiary threshold would you want if that's not sufficient?

replies(1): >>40444944 #
4. Workaccount2 ◴[] No.40441380[source]
> on the "it's just fancy autocomplete" and "it doesn't understand what it's saying, bro" rhetoric.

No matter what, there will always be a group of people saying that. The power and drive of the brain to convince itself that it is weaved of magical energy on a divine substrate shouldn't be underestimated. Especially when media plays so hard into that idea (the robots that lose the war because they cannot overcome love, etc.) because brains really love being told they are right.

I am almost certain that the first conscious silicon (or whatever material) will be subjected to immense suffering until a new generation that can accept the human brains banality can move things forward.

replies(1): >>40446059 #
5. jwilber ◴[] No.40444944{3}[source]
My point is that you claimed this is a rebuff against those claiming models don’t understand themselves. Your interpretation seems to assign intelligence to the algorithms.

While this research allows us to interpret larger models in an amazing way, it doesn’t mean the models themselves ‘understand’ anything.

You can use this on much smaller scale models as well, as they showed 8 months ago. Does that research tell us about how models understand themselves? Or does it help us understand how the models work?

replies(1): >>40446963 #
6. astrange ◴[] No.40445446[source]
I think the research is good, but it's disappointing that they hype it by claiming it's going to help their basically entirely fictional "AI safety" project, as if the bits in their model are going to come alive and eat them.
replies(1): >>40446085 #
7. ben_w ◴[] No.40446059[source]
It tickles me somewhat to note that people using the phrase "stochastic parrot" are demonstrating in themselves the exact behaviour for which they are dismissive of the LLMs.

> I am almost certain that the first conscious silicon (or whatever material) will be subjected to immense suffering until a new generation that can accept the human brains banality can move things forward.

Indeed, though as we don't know what we're doing (and have 40 definitions of "consciousness" and no way to test for qualia), I would add that the first AI we make with these properties, will likely suffer from every permutation of severe and mild mental heath disorder that is logically possible, including many we have no word for because they would be incompatible with life if found in an organic brain.

8. ben_w ◴[] No.40446085[source]
We just had a pandemic made from a non-living virus that was basically trying to eat us. To riff off the quote:

The virus does not hate you, nor does it love you, but you are made of atoms which it can use for something else.

replies(1): >>40446435 #
9. astrange ◴[] No.40446435{3}[source]
Non-living isn't a great way to describe a virus because they certainly become part of a living system once they get in your cells.

Models don't do that though, only if you run them in a loop with tools they can call, so mostly don't do that.

replies(1): >>40447305 #
10. kromem ◴[] No.40446963{4}[source]
"Understand themselves" is a very different thing than "understand what they are saying."

Which exactly are we talking about here?

Because no, the research doesn't say much about the former, but yes, it says a lot about the latter, especially on top of the many, many earlier papers working in smaller toy models demonstrating world modeling.

11. ben_w ◴[] No.40447305{4}[source]
> Models don't do that though, only if you run them in a loop with tools they can call, so mostly don't do that.

That's also a description of DNA and RNA. They're chemicals, not magic.

And there's loads of people all too eager to put any and every AI they find into such an environment[0], then connect it to a robot body[1], or connect it to the internet[2], just to see what happens. Or have an AI or algorithm design T-shirts[3] for them or trade stocks[4][5][6] for them because they don't stop and think about how this might go wrong.

[0] https://community.openai.com/t/chaosgpt-an-ai-that-seeks-to-...

[1] https://www.microsoft.com/en-us/research/group/autonomous-sy...

[2] https://platform.openai.com/docs/api-reference

[3] https://www.theguardian.com/technology/2013/mar/02/amazon-wi...

[4] https://intellectia.ai/blog/chatgpt-for-stock-trading

[5] https://en.wikipedia.org/wiki/Algorithmic_trading

[6] https://en.wikipedia.org/wiki/2007–2008_financial_crisis

replies(1): >>40447420 #
12. astrange ◴[] No.40447420{5}[source]
Those can certainly cause real problems. I just feel that to find the solutions to those problems, we have to start with real concrete issues and find the abstractions from there.

I don't think "AI safety" is the right abstraction because it came from the idea that AI would start off as an imaginary agent living in a computer that we'd teach stuff to. Whereas what we actually have is a giant pretrained blob that (unreliably) emits text when you run other text through it.

Constrained decoding (like forcing the answer to conform to JSON grammar) is an example of a real solution, and past that it's mostly the same as other software security.

replies(1): >>40447678 #
13. ben_w ◴[] No.40447678{6}[source]
> I don't think "AI safety" is the right abstraction because it came from the idea that AI would start off as an imaginary agent living in a computer that we'd teach stuff to. Whereas what we actually have is a giant pretrained blob that (unreliably) emits text when you run other text through it.

I disagree, that's simply the behaviour one of the best consumer-facing AI that gets all the air-time at the moment. (Weirdly, loads of people even here talk about AI like it's LLMs even though diffusion based image generators are also making significant progress and being targeted with lawsuits).

AI is automation — the point is to do stuff we don't want to do for whatever reason (including expense), but it does it a bit wrong. People have already died from automation that was carefully engineered but which still had mistakes; machine learning is all about letting a system engineer itself, even if you end up making a checkpoint where it's "good enough", shipping that, and telling people they don't need to train it any more… though they often will keep training it, because that's not actually hard.

We've also got plenty of agentic AI (though as that's a buzzword, bleh, lots of scammers there too), independently of the fact that it's very easy to use even an LLM (which is absolutely not designed or intended for this) as a general agent just by putting it into a loop and telling it the sort of thing it's supposed to be agentic with regards to.

Even with constrained decoding, so far as I can tell the promises are merely advert, while the reality is that's these things are only "pretty good": https://community.openai.com/t/how-to-get-100-valid-json-ans...

(But of course, this is a fast-moving area, so I may just be out of date even though that was only from a few months ago).

However, the "it's only pretty good" becomes "this isn't even possible" in certain domains; this is why, for example, ChatGPT has a disclaimer on the front about not trusting it — there's no way to know, in general, if it's just plain wrong. Which is fine when writing a newspaper column because the Gell-Mann amnesia effect says it was already like that… but not when it's being tasked with anything critical.

Hopefully nobody will use ChatGPT to plan an economy, but the point of automation is to do things for us, so some future AI will almost certainly get used that way. Just as a toy model (because it's late here and I'm tired), imagine if that future AI decides to drop everything and invest only in rice and tulips 0.001% of the time. After all, if it's just as smart as a human, and humans made that mistake…

But on the "what about humans" perspective, you can also look at the environment. I'd say there's no evil moustache twirling villains who like polluting the world, but of course there are genuinely people who do that "to own the libs"; but these are not the main source of pollution in the world, mostly it's people making decisions that seem sensible to them and yet which collectively damage the commons. Plenty of reason to expect an AI to do something that "seems sensible" to its owner, which damages the commons, even if the human is paying attention, which they're probably not doing for the same reason M3 shareholders probably weren't looking very closely to what M3 was doing — "these people are maximising my dividend payments… why is my blood full of microplastics?"