←back to thread

129 points jxmorris12 | 1 comments | | HN request time: 0.001s | source
Show context
trhway ◴[] No.43131403[source]
According to the Lecun's model a human walking step by step would have the error compounding with each step and thus would never make it to whatever intended target. Yet, as a toddlers we somehow manage to learn to walk to our targets. (and i'm an MS in Math, Control Systems :)
replies(5): >>43131500 #>>43132476 #>>43132766 #>>43132907 #>>43133699 #
Kye ◴[] No.43131500[source]
A toddler can learn by trial and error mid-process. An LLM using autoregressive inference can only compound errors. The LLDM model paper was posted elsewhere, but: https://arxiv.org/pdf/2502.09992

It basically uses the image generation approach of progressively refining the entire thing at once, but applied to text. It can self-correct mid-process.

The blog post where I found it originally that goes into more detail and raises some issues with it: https://timkellogg.me/blog/2025/02/17/diffusion

replies(2): >>43131730 #>>43133261 #
1. psb217 ◴[] No.43133261[source]
Autoregressive vs non-autoregressive is a red herring. The non-autoregressive model is still susceptible to exponential blow up of failure rate as the output dimension increases (sequence length, number of pixels, etc). The final generation step in, eg, diffusion models is independent gaussian sampling per pixel. These models can be interpreted, like autoregressive models, as assigning log-likelihoods to the data. The average log-likelihood per token/pixel/etc can still be computed and the same "raise per unit error to the number of units power" argument for exponential failure rates still holds.

One potential difference between autoregressive and non-autoregressive models is the types of failures which occur. Eg, typical failures in autoregressive models might look like spiralling off into nonsense once the first "error" is made, while non-autoregressive models might produce failures that tend to remain relatively "close" to the true data.