←back to thread

129 points jxmorris12 | 1 comments | | HN request time: 0s | source
Show context
trhway ◴[] No.43131403[source]
According to the Lecun's model a human walking step by step would have the error compounding with each step and thus would never make it to whatever intended target. Yet, as a toddlers we somehow manage to learn to walk to our targets. (and i'm an MS in Math, Control Systems :)
replies(5): >>43131500 #>>43132476 #>>43132766 #>>43132907 #>>43133699 #
Kye ◴[] No.43131500[source]
A toddler can learn by trial and error mid-process. An LLM using autoregressive inference can only compound errors. The LLDM model paper was posted elsewhere, but: https://arxiv.org/pdf/2502.09992

It basically uses the image generation approach of progressively refining the entire thing at once, but applied to text. It can self-correct mid-process.

The blog post where I found it originally that goes into more detail and raises some issues with it: https://timkellogg.me/blog/2025/02/17/diffusion

replies(2): >>43131730 #>>43133261 #
trhway ◴[] No.43131730[source]
>A toddler can learn by trial and error mid-process.

as a result of the whole learning process the toddler in particular learns how to self-correct itself, ie. as a grown up s/he knows, without much trial and errors anymore, how to continue in straight line if the previous step went sideways for whatever reason

>An LLM using autoregressive inference can only compound errors.

That is pretty powerful statement completely dismissing that some self-correction may be emerging there.

replies(1): >>43131968 #
Kye ◴[] No.43131968[source]
Can you expand on that? I don't see where it could emerge from.
replies(1): >>43132122 #
trhway ◴[] No.43132122[source]
the LLM handles/steers the representation (trajectory consisting of successive representations) in a very high-dimensional space. For example, it is very possible that those trajectories can, as a result of the learning, be driven by the minimizing distance (or some other metric) from some fact(s) representation.

The metric may be including say a weight/density of the attracting facts cluster - somewhat like gravitation drives the stuff in the Universe with the LLM learning can be thought as pre-distributing matter in its own that very high-dimensional Universe according to the semantic "gravitational" field.

The resulting - emerging - metric and associated geometry is currently mind-boggling incomprehensible, and even in much-much simpler, single-digit dimensional, spaces systems described by Lecun still can be [quasi]stable and/or [quasi]periodic around say some attractor(s).

replies(1): >>43132225 #
1. ◴[] No.43132225{3}[source]