I think Yann Lecun was right about LLMs (but perhaps only by accident)

Show context

trhway ◴[21 Feb 25 18:53 UTC] No.43131403[source]▶

According to the Lecun's model a human walking step by step would have the error compounding with each step and thus would never make it to whatever intended target. Yet, as a toddlers we somehow manage to learn to walk to our targets. (and i'm an MS in Math, Control Systems :)

replies(5): >>43131500 #>>43132476 #>>43132766 #>>43132907 #>>43133699 #

Kye ◴[21 Feb 25 19:01 UTC] No.43131500[source]▶

>>43131403 #

A toddler can learn by trial and error mid-process. An LLM using autoregressive inference can only compound errors. The LLDM model paper was posted elsewhere, but: https://arxiv.org/pdf/2502.09992

It basically uses the image generation approach of progressively refining the entire thing at once, but applied to text. It can self-correct mid-process.

The blog post where I found it originally that goes into more detail and raises some issues with it: https://timkellogg.me/blog/2025/02/17/diffusion

replies(2): >>43131730 #>>43133261 #

trhway ◴[21 Feb 25 19:21 UTC] No.43131730[source]▶

>>43131500 #

>A toddler can learn by trial and error mid-process.

as a result of the whole learning process the toddler in particular learns how to self-correct itself, ie. as a grown up s/he knows, without much trial and errors anymore, how to continue in straight line if the previous step went sideways for whatever reason

>An LLM using autoregressive inference can only compound errors.

That is pretty powerful statement completely dismissing that some self-correction may be emerging there.

replies(1): >>43131968 #

Kye ◴[21 Feb 25 19:40 UTC] No.43131968[source]▶

>>43131730 #

Can you expand on that? I don't see where it could emerge from.

replies(1): >>43132122 #

trhway ◴[21 Feb 25 19:51 UTC] No.43132122[source]▶

>>43131968 #

the LLM handles/steers the representation (trajectory consisting of successive representations) in a very high-dimensional space. For example, it is very possible that those trajectories can, as a result of the learning, be driven by the minimizing distance (or some other metric) from some fact(s) representation.

The metric may be including say a weight/density of the attracting facts cluster - somewhat like gravitation drives the stuff in the Universe with the LLM learning can be thought as pre-distributing matter in its own that very high-dimensional Universe according to the semantic "gravitational" field.

The resulting - emerging - metric and associated geometry is currently mind-boggling incomprehensible, and even in much-much simpler, single-digit dimensional, spaces systems described by Lecun still can be [quasi]stable and/or [quasi]periodic around say some attractor(s).

replies(1): >>43132225 #