←back to thread

625 points lukebennett | 1 comments | | HN request time: 0s | source
Show context
Animats ◴[] No.42139919[source]
"While the model was initially expected to significantly surpass previous versions of the technology behind ChatGPT, it fell short in key areas, particularly in answering coding questions outside its training data."

Right. If you generate some code with ChatGPT, and then try to find similar code on the web, you usually will. Search for unusual phrases in comments and for variable names. Often, something from Stack Overflow will match.

LLMs do search and copy/paste with idiom translation and some transliteration. That's good enough for a lot of common problems. Especially in the HTML/Javascript space, where people solve the same problems over and over. Or problems covered in textbooks and classes.

But it does not look like artificial general intelligence emerges from LLMs alone.

There's also the elephant in the room - the hallucination/lack of confidence metric problem. The curse of LLMs is that they return answers which are confident but wrong. "I don't know" is rarely seen. Until that's fixed, you can't trust LLMs to actually do much on their own. LLMs with a confidence metric would be much more useful than what we have now.

replies(4): >>42139986 #>>42140895 #>>42141067 #>>42143954 #
nickpsecurity ◴[] No.42143954[source]
The brain solves that problem. It seems to involve memory and specialized regions. I found a few groups building hippocampus-like, research models. One had content-addressable memory.

There was another one that claimed to get rid of hallucinations. They also said it takes 50-100 epochs for regular architectures to actually memorize something. Their paper is below in case people qualified to review it want to.

https://arxiv.org/abs/2406.17642

Like the brain, I believe the problem will be solved by a mix of specialized components working together. One of those components will be a memory (or series of them) that the others reference to keep processing grounded in reality.

replies(1): >>42144641 #
Animats ◴[] No.42144641[source]
Comments on that paper? PDF: [1]

What they are measuring, it seems, is whether LLMs can be built which will retrieve a reliable known correct answer on request. That's an information retrieval problem, and, in fact, they solve it by adding "Memory Experts" which are basically data storage.

It's not clear that this helps either replies which require synthesizing disparate information, or detecting that the training data does not contain info needed to construct a reply.

[1] https://arxiv.org/pdf/2406.17642

replies(1): >>42147890 #
1. nickpsecurity ◴[] No.42147890{3}[source]
On the second paragraph, there’s been work that shows whether a model has memorized or is strongly replying to certain prompts. Something like that combined with a memory-equipped model would tell you if it might contain the info.

From there, you need multiple layers building on info it contains to synthesize a reply that might be good. Alternatively, an iterative process going a few rounds through a model, re-presenting the combo of results together, and it fuses them. All based on known data or what’s in the prompt with nothing else.

This is speculative based on a few things our own minds do.