OpenAI, Google and Anthropic are struggling to build more advanced AI

(www.bloomberg.com)

625 points lukebennett | 3 comments | 13 Nov 24 13:28 UTC | HN request time: 0.647s | source

Show context

Animats ◴[14 Nov 24 19:07 UTC] No.42139919[source]▶

"While the model was initially expected to significantly surpass previous versions of the technology behind ChatGPT, it fell short in key areas, particularly in answering coding questions outside its training data."

Right. If you generate some code with ChatGPT, and then try to find similar code on the web, you usually will. Search for unusual phrases in comments and for variable names. Often, something from Stack Overflow will match.

LLMs do search and copy/paste with idiom translation and some transliteration. That's good enough for a lot of common problems. Especially in the HTML/Javascript space, where people solve the same problems over and over. Or problems covered in textbooks and classes.

But it does not look like artificial general intelligence emerges from LLMs alone.

There's also the elephant in the room - the hallucination/lack of confidence metric problem. The curse of LLMs is that they return answers which are confident but wrong. "I don't know" is rarely seen. Until that's fixed, you can't trust LLMs to actually do much on their own. LLMs with a confidence metric would be much more useful than what we have now.

replies(4): >>42139986 #>>42140895 #>>42141067 #>>42143954 #

dmd ◴[14 Nov 24 19:13 UTC] No.42139986[source]▶

>>42139919 #

> Right. If you generate some code with ChatGPT, and then try to find similar code on the web, you usually will.

People who "follow" AI, as the latest fad they want to comment on and appear intelligent about, repeat things like this constantly, even though they're not actually true for anything but the most trivial hello-world types of problems.

I write code all day every day. I use Copilot and the like all day every day (for me, in the medical imaging software field), and all day every day it is incredibly useful and writes nearly exactly the code I would have written, but faster. And none of it appears anywhere else; I've checked.

replies(5): >>42140406 #>>42142508 #>>42142654 #>>42143451 #>>42145565 #

ngai_aku ◴[14 Nov 24 19:50 UTC] No.42140406[source]▶

>>42139986 #

You’re solving novel problems all day every day?

replies(2): >>42140436 #>>42144250 #

dmd ◴[14 Nov 24 19:53 UTC] No.42140436[source]▶

>>42140406 #

Pretty much, yes. My job is pretty fun; it mostly entails things like "take this horrible file workflow some research assistant came up with while high 15 years ago and turn it into a newer horrible file format a NEW research assistant came up with (also while high) 3 years ago" - and automate this in our data processing pipeline.

replies(3): >>42140978 #>>42141764 #>>42141794 #

1. Der_Einzige ◴[14 Nov 24 20:41 UTC] No.42140978[source]▶

>>42140436 #

Due to WFH, the weed laws where tech workers live, and the fast tolerance building of cannabis in the body - I estimate that 10% of all code written by west coast tech workers is done “while high” and that estimate is likely low.

replies(1): >>42141577 #

2. portaouflop ◴[14 Nov 24 21:46 UTC] No.42141577[source]▶

>>42140978 (TP) #

Do tech workers write better or worse code while high ?

replies(1): >>42143325 #

3. throw310822 ◴[15 Nov 24 02:06 UTC] No.42143325[source]▶

>>42141577 #

Should copilot be renamed to "designated driver"?

↑