385 points vessenes | 1 comments | 10 Mar 25 19:41 UTC | HN request time: 0.296s | source

So, Lecun has been quite public saying that he believes LLMs will never fix hallucinations because, essentially, the token choice method at each step leads to runaway errors -- these can't be damped mathematically.

In exchange, he offers the idea that we should have something that is an 'energy minimization' architecture; as I understand it, this would have a concept of the 'energy' of an entire response, and training would try and minimize that.

Which is to say, I don't fully understand this. That said, I'm curious to hear what ML researchers think about Lecun's take, and if there's any engineering done around it. I can't find much after the release of ijepa from his group.

Show context

inimino ◴[14 Mar 25 20:47 UTC] No.43367126[source]▶

>>43325049 (OP) #

I have a paper coming up that I modestly hope will clarify some of this.

The short answer should be that it's obvious LLM training and inference are both ridiculously inefficient and biologically implausible, and therefore there has to be some big optimization wins still on the table.

replies(5): >>43367169 #>>43367233 #>>43367463 #>>43367776 #>>43367860 #

jedberg ◴[14 Mar 25 20:50 UTC] No.43367169[source]▶

>>43367126 #

> and biologically implausible

I really like this approach. Showing that we must be doing it wrong because our brains are more efficient and we aren't doing it like our brains.

Is this a common thing in ML papers or something you came up with?

replies(3): >>43367186 #>>43367478 #>>43368146 #

fluidcruft ◴[14 Mar 25 22:48 UTC] No.43368146[source]▶

>>43367169 #

How are you separating the efficiency of the architecture from the efficiency of the substrate? Unless you have a brain made of transistors or an LLM made of neurons how can you identify the source of the inefficiency?

replies(1): >>43376532 #

1. inimino ◴[16 Mar 25 02:22 UTC] No.43376532[source]▶

>>43368146 #

You can't but the transistor-based approach is the inefficient one, and transistors are pretty good at efficiently doing logic, so either there's no possible efficient solution based on deterministic computation, or there's tremendous headroom.

I believe human and machine learning unify into a pretty straightforward model and this shows that what we're doing that ML doesn't can be copied across, and I don't think the substrate is that significant.

↑

Ask HN: Any insider takes on Yann LeCun's push against current architectures?