Most active commenters

dmd(5)

Popular/hot comments

>>42140436 #

←back to thread

OpenAI, Google and Anthropic are struggling to build more advanced AI

(www.bloomberg.com)

Show context

Animats ◴[14 Nov 24 19:07 UTC] No.42139919[source]▶

>>42125888 (OP) #

"While the model was initially expected to significantly surpass previous versions of the technology behind ChatGPT, it fell short in key areas, particularly in answering coding questions outside its training data."

Right. If you generate some code with ChatGPT, and then try to find similar code on the web, you usually will. Search for unusual phrases in comments and for variable names. Often, something from Stack Overflow will match.

LLMs do search and copy/paste with idiom translation and some transliteration. That's good enough for a lot of common problems. Especially in the HTML/Javascript space, where people solve the same problems over and over. Or problems covered in textbooks and classes.

But it does not look like artificial general intelligence emerges from LLMs alone.

There's also the elephant in the room - the hallucination/lack of confidence metric problem. The curse of LLMs is that they return answers which are confident but wrong. "I don't know" is rarely seen. Until that's fixed, you can't trust LLMs to actually do much on their own. LLMs with a confidence metric would be much more useful than what we have now.

replies(4): >>42139986 #>>42140895 #>>42141067 #>>42143954 #

1. dmd ◴[14 Nov 24 19:13 UTC] No.42139986[source]▶

>>42139919 #

> Right. If you generate some code with ChatGPT, and then try to find similar code on the web, you usually will.

People who "follow" AI, as the latest fad they want to comment on and appear intelligent about, repeat things like this constantly, even though they're not actually true for anything but the most trivial hello-world types of problems.

I write code all day every day. I use Copilot and the like all day every day (for me, in the medical imaging software field), and all day every day it is incredibly useful and writes nearly exactly the code I would have written, but faster. And none of it appears anywhere else; I've checked.

replies(5): >>42140406 #>>42142508 #>>42142654 #>>42143451 #>>42145565 #

2. ngai_aku ◴[14 Nov 24 19:50 UTC] No.42140406[source]▶

>>42139986 (TP) #

You’re solving novel problems all day every day?

replies(2): >>42140436 #>>42144250 #

3. dmd ◴[14 Nov 24 19:53 UTC] No.42140436[source]▶

>>42140406 #

Pretty much, yes. My job is pretty fun; it mostly entails things like "take this horrible file workflow some research assistant came up with while high 15 years ago and turn it into a newer horrible file format a NEW research assistant came up with (also while high) 3 years ago" - and automate this in our data processing pipeline.

replies(3): >>42140978 #>>42141764 #>>42141794 #

4. Der_Einzige ◴[14 Nov 24 20:41 UTC] No.42140978{3}[source]▶

>>42140436 #

Due to WFH, the weed laws where tech workers live, and the fast tolerance building of cannabis in the body - I estimate that 10% of all code written by west coast tech workers is done “while high” and that estimate is likely low.

replies(1): >>42141577 #

5. portaouflop ◴[14 Nov 24 21:46 UTC] No.42141577{4}[source]▶

>>42140978 #

Do tech workers write better or worse code while high ?

replies(1): >>42143325 #

6. delusional ◴[14 Nov 24 22:12 UTC] No.42141764{3}[source]▶

>>42140436 #

If I understand that correctly you're converting file formats? That's not exactly "novel"

replies(1): >>42142072 #

7. fireflash38 ◴[14 Nov 24 22:17 UTC] No.42141794{3}[source]▶

>>42140436 #

If you've got clearly defined start input format and end output format, sure it seems that it would be a good candidate for heavy LLM use. But I don't know if that's most people.

replies(1): >>42141811 #

8. dmd ◴[14 Nov 24 22:19 UTC] No.42141811{4}[source]▶

>>42141794 #

If it were ever clearly defined or even consistent from input to input I would be overjoyed.

9. llm_trw ◴[14 Nov 24 22:51 UTC] No.42142072{4}[source]▶

>>42141764 #

This is exactly the type of novel work that llms are good at. It's tedious and has annoying internal logic, but that logic is quite flat and there are a million examples to generalise from.

What they fail at is code with high cyclomatic complexity. Back in the llama 2 finetune days I wrote a script that would break down what each node in the control flow graph into its own prompt using literate programming and the results were amazing for the time. Using the same prompts I'd get correct code in every language I tried.

10. tymscar ◴[14 Nov 24 23:42 UTC] No.42142508[source]▶

>>42139986 (TP) #

How often did you check?

11. wokwokwok ◴[15 Nov 24 00:02 UTC] No.42142654[source]▶

>>42139986 (TP) #

> even though they're not actually true for anything but the most trivial hello-world types of problems.

Um.

All the parent post said was:

> then try to find similar code on the web, you usually will.

Not identical code. Similar code.

I think you're really stretching the domain of plausibility to suggest that any code you write is novel enough that you can't find 'similar' code on the internet.

To suggest that code generated from a corpus that is not going to be 'similar' to the code from the corpus is just factually and unambiguously false.

Of course, it depends on what you interpret 'similar' to mean; but I think it's not unfair to say a lot of code is composed of smaller parts of code that is extremely similar to other examples of code on the internet.

Obviously you're not going to find an example similar to your entire code base; but if you're using, for example, copilot where you generate many small snippets of code... welll....

replies(1): >>42142676 #

12. dmd ◴[15 Nov 24 00:06 UTC] No.42142676[source]▶

>>42142654 #

Ok, yes. There are other pieces of code on the internet that use a for loop or an if statement.

By that logic what you wrote was also composed that way. After all, you’ve used all words that have been used before! I bet even phrases like “that is extremely similar” and “generated from a corpus” and “unambiguously false”.

Again, I really find it hard to believe that anyone could make an argument like the one you’re making who has actually used these tools in their work for hundreds of hours, vs. for a couple minutes here or there with made up problems.

replies(1): >>42143823 #

13. throw310822 ◴[15 Nov 24 02:06 UTC] No.42143325{5}[source]▶

>>42141577 #

Should copilot be renamed to "designated driver"?

14. bobsmooth ◴[15 Nov 24 02:30 UTC] No.42143451[source]▶

>>42139986 (TP) #

I've had generated code include comments so specific I was able to find the exact github repo where it came from.

15. wokwokwok ◴[15 Nov 24 03:53 UTC] No.42143823{3}[source]▶

>>42142676 #

> I really find it hard to believe

What's true and what's not true is not related to what you personally believe.

It is factually and unambiguously false to state that generated code is, in general, not similar to other code from the corpus it is trained on.

> And none of it appears anywhere else; I've checked.

^ Even if this statement, is not false (I'm skeptical, but whatever), in general, it would be false for most users of copilot.

None of it appears anywhere else? None of it? Really?

That's not true of the no-AI code base I'm working on.

That's very difficult to believe it would be true on a code base heavily written by copilot and the like.

It's probably not true, in general, for AI generated code bases.

We can have a different conversation about verbatim copied code, where an AI model generates a large body of verbatim copy from a training source. That's very unusual.

...but to say the generated code wouldn't even be similar? Come on.

That's literally what LLMs do.

replies(1): >>42146049 #

16. gitaarik ◴[15 Nov 24 05:47 UTC] No.42144250[source]▶

>>42140406 #

If we weren't, we (as in developers) weren't needed, right?

17. cageface ◴[15 Nov 24 10:27 UTC] No.42145565[source]▶

>>42139986 (TP) #

I find specific blog posts ChatGPT is cribbing from all the time when I use it. I think it depends a lot on exactly what you're asking it for.

18. dmd ◴[15 Nov 24 11:44 UTC] No.42146049{4}[source]▶

>>42143823 #

This is like having an argument about whether airplanes can fly with someone who has never been in, piloted, or even really seen an airplane but is very, very sure of their understanding of how they can’t possibly work.

Among other things: it writes new, useful code daily in our local DSL, which appears nowhere on the internet and in fact didn't exist a few months ago.

↑