Most active commenters

esafak(3)

Ask HN: Any insider takes on Yann LeCun's push against current architectures?

So, Lecun has been quite public saying that he believes LLMs will never fix hallucinations because, essentially, the token choice method at each step leads to runaway errors -- these can't be damped mathematically.

In exchange, he offers the idea that we should have something that is an 'energy minimization' architecture; as I understand it, this would have a concept of the 'energy' of an entire response, and training would try and minimize that.

Which is to say, I don't fully understand this. That said, I'm curious to hear what ML researchers think about Lecun's take, and if there's any engineering done around it. I can't find much after the release of ijepa from his group.

Show context

inimino ◴[14 Mar 25 20:47 UTC] No.43367126[source]▶

>>43325049 (OP) #

I have a paper coming up that I modestly hope will clarify some of this.

The short answer should be that it's obvious LLM training and inference are both ridiculously inefficient and biologically implausible, and therefore there has to be some big optimization wins still on the table.

replies(5): >>43367169 #>>43367233 #>>43367463 #>>43367776 #>>43367860 #

1. jedberg ◴[14 Mar 25 20:50 UTC] No.43367169[source]▶

>>43367126 #

> and biologically implausible

I really like this approach. Showing that we must be doing it wrong because our brains are more efficient and we aren't doing it like our brains.

Is this a common thing in ML papers or something you came up with?

replies(3): >>43367186 #>>43367478 #>>43368146 #

2. esafak ◴[14 Mar 25 20:52 UTC] No.43367186[source]▶

>>43367169 (TP) #

Evolution does not need to converge on the optimum solution.

Have you heard of https://en.wikipedia.org/wiki/Bio-inspired_computing ?

replies(2): >>43367202 #>>43367214 #

3. jedberg ◴[14 Mar 25 20:54 UTC] No.43367202[source]▶

>>43367186 #

It does not, you're right. But it's an interesting way to approach the problem never the less. And given that we definitely aren't as efficient as a human brain right now, it makes sense to look at the brain for inspiration.

4. parsimo2010 ◴[14 Mar 25 20:56 UTC] No.43367214[source]▶

>>43367186 #

I don't think GP was implying that brains are the optimum solution. I think you can interpret GP's comments like this- if our brains are more efficient than LLMs, then clearly LLMs aren't optimally efficient. We have at least one data point showing that better efficiency is possible, even if we don't know what the optimal approach is.

replies(1): >>43367256 #

5. esafak ◴[14 Mar 25 21:00 UTC] No.43367256{3}[source]▶

>>43367214 #

I agree. Spiking neural networks are usually mentioned in this context, but there is no hardware ecosystem behind them that can compete with Nvidia and CUDA.

replies(1): >>43367517 #

6. _3u10 ◴[14 Mar 25 21:27 UTC] No.43367478[source]▶

>>43367169 (TP) #

Nah it’s just physics, it’s like wheels being more efficient than legs.

We know there is a more efficient solution (human brain) but we don’t know how to make it.

So it stands to reason that we can make more efficient LLMs, just like a CPU can add numbers more efficiently than humans.

replies(1): >>43368078 #

7. leereeves ◴[14 Mar 25 21:31 UTC] No.43367517{4}[source]▶

>>43367256 #

Investments in AI are now counting by billions of dollars. Would that be enough to create an initial ecosystem for a new architecture?

replies(2): >>43367704 #>>43367775 #

8. esafak ◴[14 Mar 25 21:56 UTC] No.43367704{5}[source]▶

>>43367517 #

Nvidia has a big lead, and hardware is capital intensive. I guess an alternative would make sense in the battery-powered regime, like robotics, where Nvidia's power hungry machines are at a disadvantage. This is how ARM took on Intel.

9. vlovich123 ◴[14 Mar 25 22:06 UTC] No.43367775{5}[source]▶

>>43367517 #

A new HW architecture for an unproven SW architecture is never going to happen. The SW needs to start working initially and demonstrate better performance. Of course, as with the original deep neural net stuff, it took computers getting sufficiently advanced to demonstrate this is possible. A different SW architecture would have to be so much more efficient to work. Moreover, HW and SW evolve in tandem - HW takes existing SW and tries to optimize it (e.g. by adding an abstraction layer) or SW tries to leverage existing HW to run a new architecture faster. Coming up with a new HW/SW combo seems unlikely given the cost of bringing HW to market. If AI speedup of HW ever delivers like Jeff Dean expects, then the cost of prototyping might come down enough to try to make these kinds of bets.

10. jonplackett ◴[14 Mar 25 22:39 UTC] No.43368078[source]▶

>>43367478 #

Wheels is an interesting analogy. Wheels are more efficient now that we have roads. But there could never have been evolutionary pressure to make them before there were roads. Wheels are also a lot easier to get to work than robotic legs and so long as there’s a road do a lot more than robotic legs.

replies(2): >>43378048 #>>43378070 #

11. fluidcruft ◴[14 Mar 25 22:48 UTC] No.43368146[source]▶

>>43367169 (TP) #

How are you separating the efficiency of the architecture from the efficiency of the substrate? Unless you have a brain made of transistors or an LLM made of neurons how can you identify the source of the inefficiency?

replies(1): >>43376532 #

12. inimino ◴[16 Mar 25 02:22 UTC] No.43376532[source]▶

>>43368146 #

You can't but the transistor-based approach is the inefficient one, and transistors are pretty good at efficiently doing logic, so either there's no possible efficient solution based on deterministic computation, or there's tremendous headroom.

I believe human and machine learning unify into a pretty straightforward model and this shows that what we're doing that ML doesn't can be copied across, and I don't think the substrate is that significant.

13. ◴[16 Mar 25 10:48 UTC] No.43378048{3}[source]▶

>>43368078 #

14. _3u10 ◴[16 Mar 25 10:54 UTC] No.43378070{3}[source]▶

>>43368078 #

People think the first wheel was invented for making pottery. Biological machinery for the most part has to be self-reproducing so there is a lot of limitations on design, also it has to be able to evolve, so you get inefficient solutions like the vargas nerve (i think that's its name), basically there's a really long nerve in your body that takes a route under your trachea and then back up to another part of your brain, in giraffes its something like 40 feet long to go a few inches shortest path.

Wheels other than rolling would likely never evolve naturally because there's no real incremental path from legs to wheels, where as flippers can evolve from webbed fingers incrementally getting better for moving in water.

I dunno, maybe there's an evolutionary path for wheels, but i don't think so.

↑