←back to thread

454 points nathan-barry | 1 comments | | HN request time: 0.214s | source
Show context
BoiledCabbage ◴[] No.45645494[source]
To me part of the appeal of image diffusion models was starting with random noise to produce an image. Why do text diffudion models start with a blank slate (ie all "masked" tokens), instead of with random tokens?
replies(2): >>45645710 #>>45646617 #
1. didibus ◴[] No.45645710[source]
They don't all do that. There's many approaches being experimented on.

Some start with random tokens, or with masks, others even start with random vector embeddings.