OpenAI, Google and Anthropic are struggling to build more advanced AI

(www.bloomberg.com)

625 points lukebennett | 1 comments | 13 Nov 24 13:28 UTC | HN request time: 0.218s | source

Show context

irrational ◴[14 Nov 24 18:03 UTC] No.42139106[source]▶

> The AGI bubble is bursting a little bit

I'm surprised that any of these companies consider what they are working on to be Artificial General Intelligences. I'm probably wrong, but my impression was AGI meant the AI is self aware like a human. An LLM hardly seems like something that will lead to self-awareness.

replies(18): >>42139138 #>>42139186 #>>42139243 #>>42139257 #>>42139286 #>>42139294 #>>42139338 #>>42139534 #>>42139569 #>>42139633 #>>42139782 #>>42139855 #>>42139950 #>>42139969 #>>42140128 #>>42140234 #>>42142661 #>>42157364 #

vundercind ◴[14 Nov 24 18:56 UTC] No.42139782[source]▶

>>42139106 #

I thought maybe they were on the right track until I read Attention Is All You Need.

Nah, at best we found a way to make one part of a collection of systems that will, together, do something like thinking. Thinking isn’t part of what this current approach does.

What’s most surprising about modern LLMs is that it turns out there is so much information statistically encoded in the structure of our writing that we can use only that structural information to build a fancy Plinko machine and not only will the output mimic recognizable grammar rules, but it will also sometimes seem to make actual sense, too—and the system doesn’t need to think or actually “understand” anything for us to, basically, usefully query that information that was always there in our corpus of literature, not in the plain meaning of the words, but in the structure of the writing.

replies(5): >>42139883 #>>42139888 #>>42139993 #>>42140508 #>>42140521 #

hackinthebochs ◴[14 Nov 24 19:04 UTC] No.42139888[source]▶

>>42139782 #

I see takes like this all the time and its so confusing. Why does knowing how things work under the hood make you think its not on the path towards AGI? What was lacking in the Attention paper that tells you AGI won't be built on LLMs? If its the supposed statistical nature of LLMs (itself a questionable claim), why does statistics seem so deflating to you?

replies(4): >>42140161 #>>42141243 #>>42142441 #>>42145571 #

chongli ◴[14 Nov 24 21:08 UTC] No.42141243[source]▶

>>42139888 #

Because it can't apply any reasoning that hasn't already been done and written into its training set. As soon as you ask it novel questions it falls apart. The big LLM vendors like OpenAI are playing whack-a-mole on these novel questions when they go viral on social media, all in a desperate bid to hide this fatal flaw.

The Emperor has no clothes.

replies(1): >>42141420 #

hackinthebochs ◴[14 Nov 24 21:28 UTC] No.42141420[source]▶

>>42141243 #

>As soon as you ask it novel questions it falls apart.

What do you mean by novel? Almost all sentences it is prompted on are brand new and it mostly responds sensibly. Surely there's some generalization going on.

replies(1): >>42141945 #

chongli ◴[14 Nov 24 22:38 UTC] No.42141945[source]▶

>>42141420 #

Novel as in requiring novel reasoning to sort out. One of the classic ways to expose the issue is to take a common puzzle and introduce irrelevant details and perhaps trivialize the solution. LLMs pattern match on the general form of the puzzle and then wander down the garden path to an incorrect solution that no human would fall for.

The sort of generalization these things can do seems to mostly be the trivial sort: substitution.

replies(2): >>42142079 #>>42142154 #

moffkalast ◴[14 Nov 24 22:51 UTC] No.42142079[source]▶

>>42141945 #

Well the problem with that approach is that LLMs are still both incredibly dumb and small, at least compared to the what, 700T params of a human brain? Can't compare the two directly, especially when one has a massive recall advantage that skews the perception of that. But there is still some inteligence under there that's not just memorization. Not much, but some.

So if you present a novel problem it would need to be extremely simple, not something that you couldn't solve when drunk and half awake. Completely novel, but extremely simple. I think that's testable.

replies(1): >>42142156 #

chongli ◴[14 Nov 24 22:59 UTC] No.42142156[source]▶

>>42142079 #

It’s not fair to ask me to judge them based on their size. I’m judging them based on the claims of their vendors.

Anyway the novel problems I’m talking about are extremely simple. Basically they’re variations on the “farmer, 3 animals, and a rowboat” problem. People keep finding trivial modifications to the problem that fool the LLMs but wouldn’t fool a child. Then the vendors come along and patch the model to deal with them. This is what I mean by whack-a-mole.

Searle’s Chinese Room thought experiment tells us that enough games of whack-a-mole could eventually get us to a pretty good facsimile of reasoning without ever achieving the genuine article.

replies(1): >>42142295 #

1. moffkalast ◴[14 Nov 24 23:15 UTC] No.42142295[source]▶

>>42142156 #

Well that's true and has been pretty glaring, but they've needed to do that in cases where models seem to fail to grasp the some concept across the board and not in cases where they don't.

Like, every time an LLM gets something right we assume they've seen it somewhere in the training data, and every time they fail we presume they haven't. But that may not always be the case, it's just extremely hard to prove it one way or the other unless you search the entire dataset. Ironically the larger the dataset, the more likely the model is generalizing while also making it harder to prove if it's really so.

To give a human example, in a school setting you have teachers tasked with figuring out that exact thing for students. Sometimes people will read the question wrong with full understanding and fail, while other times they won't know anything and make it through with a lucky guess. If LLMs (and their vendors) have learned anything it's that confidently bullshitting gets you very far which makes it even harder to tell in cases where they aren't. Somehow it's also become ubiquitous to tune models to never even say "I don't know" because it boosts benchmark scores slightly.

↑