←back to thread

454 points nathan-barry | 1 comments | | HN request time: 0s | source
Show context
kibwen ◴[] No.45645307[source]
To me, the diffusion-based approach "feels" more akin to whats going on in an animal brain than the token-at-a-time approach of the in-vogue LLMs. Speaking for myself, I don't generate words one a time based on previously spoken words; I start by having some fuzzy idea in my head and the challenge is in serializing it into language coherently.
replies(14): >>45645350 #>>45645383 #>>45645401 #>>45645402 #>>45645509 #>>45645523 #>>45645607 #>>45645665 #>>45645670 #>>45645891 #>>45645973 #>>45647491 #>>45648578 #>>45652892 #
ma2rten ◴[] No.45645523[source]
Interpretability research has found that Autoregressive LLMs also plan ahead what they are going to say.
replies(2): >>45645712 #>>45646027 #
1. thamer ◴[] No.45646027[source]
The March 2025 blog post by Anthropic titled "Tracing the thoughts of a large language model"[1] is a great introduction to this research, showing how their language model activates features representing concepts that will eventually get connected at some later point as the output tokens are produced.

The associated paper[2] goes into a lot more detail, and includes interactive features that help illustrate how the model "thinks" ahead of time.

[1] https://www.anthropic.com/research/tracing-thoughts-language...

[2] https://transformer-circuits.pub/2025/attribution-graphs/bio...