BERT is just a single text diffusion step

(nathan.rs)

454 points nathan-barry | 1 comments | 20 Oct 25 14:31 UTC | HN request time: 0s | source

Show context

kibwen ◴[20 Oct 25 15:52 UTC] No.45645307[source]▶

To me, the diffusion-based approach "feels" more akin to whats going on in an animal brain than the token-at-a-time approach of the in-vogue LLMs. Speaking for myself, I don't generate words one a time based on previously spoken words; I start by having some fuzzy idea in my head and the challenge is in serializing it into language coherently.

replies(14): >>45645350 #>>45645383 #>>45645401 #>>45645402 #>>45645509 #>>45645523 #>>45645607 #>>45645665 #>>45645670 #>>45645891 #>>45645973 #>>45647491 #>>45648578 #>>45652892 #

crubier ◴[20 Oct 25 15:58 UTC] No.45645401[source]▶

>>45645307 #

You 100% do pronounce or write words one at a time sequentially.

But before starting your sentence, you internally formulate the gist of the sentence you're going to say.

Which is exactly what happens in LLMs latent space too before they start outputting the first token.

replies(5): >>45645466 #>>45645546 #>>45645695 #>>45645968 #>>45646205 #

taeric ◴[20 Oct 25 16:10 UTC] No.45645546[source]▶

>>45645401 #

I'm curious what makes you so confident on this? I confess I expect that people are often far more cognizant of the last thing that the they want to say when they start?

I don't think you do a random walk through the words of a sentence as you conceive it. But it is hard not to think people don't center themes and moods in their mind as they compose their thoughts into sentences.

Similarly, have you ever looked into how actors learn their lines? It is often in a way that is a lot closer to a diffusion than token at a time.

replies(7): >>45645580 #>>45645621 #>>45646119 #>>45646153 #>>45646165 #>>45647044 #>>45647828 #

1. CaptainOfCoit ◴[20 Oct 25 16:13 UTC] No.45645580{3}[source]▶

>>45645546 #

I think there is a wide range of ways to "turn something in the head into words", and sometimes you use the "this is the final point, work towards it" approach and sometimes you use the "not sure what will happen, lets just start talking and go wherever". Different approaches have different tradeoffs, and of course different people have different defaults.

I can confess to not always knowing where I'll end up when I start talking. Similarly, not every time I open my mouth it's just to start but sometimes I do have a goal and conclusion.

↑