←back to thread

385 points vessenes | 1 comments | | HN request time: 1.717s | source

So, Lecun has been quite public saying that he believes LLMs will never fix hallucinations because, essentially, the token choice method at each step leads to runaway errors -- these can't be damped mathematically.

In exchange, he offers the idea that we should have something that is an 'energy minimization' architecture; as I understand it, this would have a concept of the 'energy' of an entire response, and training would try and minimize that.

Which is to say, I don't fully understand this. That said, I'm curious to hear what ML researchers think about Lecun's take, and if there's any engineering done around it. I can't find much after the release of ijepa from his group.

1. jiggawatts ◴[] No.43367692[source]
My observation from the outside watching this all unfold is that not enough effort seems to be going into the training schedule.

I say schedule because the “static data once through” is the root of the problem in my mind is one of the root problems.

Think about what happens when you read something like a book. You’re not “just” reading it, you’re also comparing it to other books, other books by the same author, while critically considering the book recommendations made by your friend. Any events in the book get compared to your life experience, etc…

LLM training does none of this! It’s a once-through text prediction training regime.

What this means in practice is that an LLM can’t write a review of a book unless it has read many reviews already. They have, of course, but the problem doesn’t go away. Ask an AI to critique book reviews and it’ll run out of steam because it hasn’t seen many of those. Critiques of critiques is where they start falling flat on their face.

This kind of meta-knowledge is precisely what experts accumulate.

As a programmer I don’t just regurgitate code I’ve seen before with slight variations — instead I know that mainstream criticisms of micro services misses their key benefit of extreme team scalability!

This is the crux of it: when humans read their training material they are generating an “n+1” level in their mind that they also learn. The current AI training setup trains the AI only the “n”th level.

This can be solved by running the training in a loop for several iterations after base training. The challenge of course is to develop a meaningful loss function.

IMHO the “thinking” model training is a step in the right direction but nowhere near enough to produce AGI all by itself.