←back to thread

454 points nathan-barry | 1 comments | | HN request time: 0.001s | source
Show context
kibwen ◴[] No.45645307[source]
To me, the diffusion-based approach "feels" more akin to whats going on in an animal brain than the token-at-a-time approach of the in-vogue LLMs. Speaking for myself, I don't generate words one a time based on previously spoken words; I start by having some fuzzy idea in my head and the challenge is in serializing it into language coherently.
replies(14): >>45645350 #>>45645383 #>>45645401 #>>45645402 #>>45645509 #>>45645523 #>>45645607 #>>45645665 #>>45645670 #>>45645891 #>>45645973 #>>45647491 #>>45648578 #>>45652892 #
sailingparrot ◴[] No.45645973[source]
> the token-at-a-time approach of the in-vogue LLMs. Speaking for myself, I don't generate words one a time based on previously spoken words

Autoregressive LLMs don't do that either actually. Sure with one forward pass you only get one token at a time, but looking at what is happening in the latent space there are clear signs of long term planning and reasoning that go beyond just the next token.

So I don't think it's necessarily more or less similar to us than diffusion, we do say one word at a time sequentially, even if we have the bigger picture in mind.

replies(5): >>45646422 #>>45650316 #>>45654585 #>>45656793 #>>45663541 #
bjourne ◴[] No.45650316[source]
That is precisely what autoregressive means. Perhaps you meant to write that modern LLMs are not strictly autoregressive?
replies(1): >>45650411 #
janalsncm ◴[] No.45650411[source]
I think they are distinguishing the mechanical process of generation from the way the idea exists. It’s the same as how a person can literally only speak one word at a time but the ideas might be nonlinear.
replies(2): >>45650595 #>>45659970 #
1. sailingparrot ◴[] No.45650595{4}[source]
Indeed what I meant. The LLM isn’t a blank slate at the beginning of each new token during autoregression as the kv cache is there.