I think Yann Lecun was right about LLMs (but perhaps only by accident)

(substack.com)

129 points jxmorris12 | 1 comments | 21 Feb 25 18:23 UTC | HN request time: 0.001s | source

Show context

trhway ◴[21 Feb 25 18:53 UTC] No.43131403[source]▶

According to the Lecun's model a human walking step by step would have the error compounding with each step and thus would never make it to whatever intended target. Yet, as a toddlers we somehow manage to learn to walk to our targets. (and i'm an MS in Math, Control Systems :)

replies(5): >>43131500 #>>43132476 #>>43132766 #>>43132907 #>>43133699 #

Kye ◴[21 Feb 25 19:01 UTC] No.43131500[source]▶

>>43131403 #

A toddler can learn by trial and error mid-process. An LLM using autoregressive inference can only compound errors. The LLDM model paper was posted elsewhere, but: https://arxiv.org/pdf/2502.09992

It basically uses the image generation approach of progressively refining the entire thing at once, but applied to text. It can self-correct mid-process.

The blog post where I found it originally that goes into more detail and raises some issues with it: https://timkellogg.me/blog/2025/02/17/diffusion

replies(2): >>43131730 #>>43133261 #

1. psb217 ◴[21 Feb 25 21:30 UTC] No.43133261[source]▶

>>43131500 #

Autoregressive vs non-autoregressive is a red herring. The non-autoregressive model is still susceptible to exponential blow up of failure rate as the output dimension increases (sequence length, number of pixels, etc). The final generation step in, eg, diffusion models is independent gaussian sampling per pixel. These models can be interpreted, like autoregressive models, as assigning log-likelihoods to the data. The average log-likelihood per token/pixel/etc can still be computed and the same "raise per unit error to the number of units power" argument for exponential failure rates still holds.

One potential difference between autoregressive and non-autoregressive models is the types of failures which occur. Eg, typical failures in autoregressive models might look like spiralling off into nonsense once the first "error" is made, while non-autoregressive models might produce failures that tend to remain relatively "close" to the true data.

↑