(nathan.rs)

454 points nathan-barry | 2 comments | 20 Oct 25 14:31 UTC | HN request time: 0s | source

Show context

kibwen ◴[20 Oct 25 15:52 UTC] No.45645307[source]▶

To me, the diffusion-based approach "feels" more akin to whats going on in an animal brain than the token-at-a-time approach of the in-vogue LLMs. Speaking for myself, I don't generate words one a time based on previously spoken words; I start by having some fuzzy idea in my head and the challenge is in serializing it into language coherently.

replies(14): >>45645350 #>>45645383 #>>45645401 #>>45645402 #>>45645509 #>>45645523 #>>45645607 #>>45645665 #>>45645670 #>>45645891 #>>45645973 #>>45647491 #>>45648578 #>>45652892 #

ma2rten ◴[20 Oct 25 16:08 UTC] No.45645523[source]▶

>>45645307 #

Interpretability research has found that Autoregressive LLMs also plan ahead what they are going to say.

replies(2): >>45645712 #>>45646027 #

aidenn0 ◴[20 Oct 25 16:24 UTC] No.45645712[source]▶

>>45645523 #

This seems likely just from the simple fact that they can reliably generate contextually correct sentences in e.g. German Imperfekt.

replies(3): >>45651812 #>>45651822 #>>45653730 #

1. ma2rten ◴[21 Oct 25 02:30 UTC] No.45651812{3}[source]▶

>>45645712 #

It's actually true on many levels, if you think about is needed for generating syntactically and grammatically correct sentences, coherent text and working code.

replies(1): >>45658031 #

2. aidenn0 ◴[21 Oct 25 16:48 UTC] No.45658031[source]▶

>>45651812 (TP) #

Just generating syntactically and grammatically correct sentences doesn't need much lookahead; prefixes to sentences that cannot be properly completed are going to be extremely unlikely to be generated.

↑

BERT is just a single text diffusion step