OpenAI, Google and Anthropic are struggling to build more advanced AI

(www.bloomberg.com)

625 points lukebennett | 3 comments | 13 Nov 24 13:28 UTC | HN request time: 0s | source

Show context

Animats ◴[14 Nov 24 19:07 UTC] No.42139919[source]▶

"While the model was initially expected to significantly surpass previous versions of the technology behind ChatGPT, it fell short in key areas, particularly in answering coding questions outside its training data."

Right. If you generate some code with ChatGPT, and then try to find similar code on the web, you usually will. Search for unusual phrases in comments and for variable names. Often, something from Stack Overflow will match.

LLMs do search and copy/paste with idiom translation and some transliteration. That's good enough for a lot of common problems. Especially in the HTML/Javascript space, where people solve the same problems over and over. Or problems covered in textbooks and classes.

But it does not look like artificial general intelligence emerges from LLMs alone.

There's also the elephant in the room - the hallucination/lack of confidence metric problem. The curse of LLMs is that they return answers which are confident but wrong. "I don't know" is rarely seen. Until that's fixed, you can't trust LLMs to actually do much on their own. LLMs with a confidence metric would be much more useful than what we have now.

replies(4): >>42139986 #>>42140895 #>>42141067 #>>42143954 #

dmd ◴[14 Nov 24 19:13 UTC] No.42139986[source]▶

>>42139919 #

> Right. If you generate some code with ChatGPT, and then try to find similar code on the web, you usually will.

People who "follow" AI, as the latest fad they want to comment on and appear intelligent about, repeat things like this constantly, even though they're not actually true for anything but the most trivial hello-world types of problems.

I write code all day every day. I use Copilot and the like all day every day (for me, in the medical imaging software field), and all day every day it is incredibly useful and writes nearly exactly the code I would have written, but faster. And none of it appears anywhere else; I've checked.

replies(5): >>42140406 #>>42142508 #>>42142654 #>>42143451 #>>42145565 #

wokwokwok ◴[15 Nov 24 00:02 UTC] No.42142654[source]▶

>>42139986 #

> even though they're not actually true for anything but the most trivial hello-world types of problems.

Um.

All the parent post said was:

> then try to find similar code on the web, you usually will.

Not identical code. Similar code.

I think you're really stretching the domain of plausibility to suggest that any code you write is novel enough that you can't find 'similar' code on the internet.

To suggest that code generated from a corpus that is not going to be 'similar' to the code from the corpus is just factually and unambiguously false.

Of course, it depends on what you interpret 'similar' to mean; but I think it's not unfair to say a lot of code is composed of smaller parts of code that is extremely similar to other examples of code on the internet.

Obviously you're not going to find an example similar to your entire code base; but if you're using, for example, copilot where you generate many small snippets of code... welll....

replies(1): >>42142676 #

1. dmd ◴[15 Nov 24 00:06 UTC] No.42142676[source]▶

>>42142654 #

Ok, yes. There are other pieces of code on the internet that use a for loop or an if statement.

By that logic what you wrote was also composed that way. After all, you’ve used all words that have been used before! I bet even phrases like “that is extremely similar” and “generated from a corpus” and “unambiguously false”.

Again, I really find it hard to believe that anyone could make an argument like the one you’re making who has actually used these tools in their work for hundreds of hours, vs. for a couple minutes here or there with made up problems.

replies(1): >>42143823 #

2. wokwokwok ◴[15 Nov 24 03:53 UTC] No.42143823[source]▶

>>42142676 (TP) #

> I really find it hard to believe

What's true and what's not true is not related to what you personally believe.

It is factually and unambiguously false to state that generated code is, in general, not similar to other code from the corpus it is trained on.

> And none of it appears anywhere else; I've checked.

^ Even if this statement, is not false (I'm skeptical, but whatever), in general, it would be false for most users of copilot.

None of it appears anywhere else? None of it? Really?

That's not true of the no-AI code base I'm working on.

That's very difficult to believe it would be true on a code base heavily written by copilot and the like.

It's probably not true, in general, for AI generated code bases.

We can have a different conversation about verbatim copied code, where an AI model generates a large body of verbatim copy from a training source. That's very unusual.

...but to say the generated code wouldn't even be similar? Come on.

That's literally what LLMs do.

replies(1): >>42146049 #

3. dmd ◴[15 Nov 24 11:44 UTC] No.42146049[source]▶

>>42143823 #

This is like having an argument about whether airplanes can fly with someone who has never been in, piloted, or even really seen an airplane but is very, very sure of their understanding of how they can’t possibly work.

Among other things: it writes new, useful code daily in our local DSL, which appears nowhere on the internet and in fact didn't exist a few months ago.

↑