(nathan.rs)

454 points nathan-barry | 1 comments | 20 Oct 25 14:31 UTC | HN request time: 0.214s | source

Show context

BoiledCabbage ◴[20 Oct 25 16:06 UTC] No.45645494[source]▶

To me part of the appeal of image diffusion models was starting with random noise to produce an image. Why do text diffudion models start with a blank slate (ie all "masked" tokens), instead of with random tokens?

replies(2): >>45645710 #>>45646617 #

1. didibus ◴[20 Oct 25 16:24 UTC] No.45645710[source]▶

>>45645494 #

They don't all do that. There's many approaches being experimented on.

Some start with random tokens, or with masks, others even start with random vector embeddings.

↑

BERT is just a single text diffusion step