←back to thread

566 points PaulHoule | 1 comments | | HN request time: 0.417s | source
Show context
chc4 ◴[] No.44490102[source]
Using the free playground link, and it is in fact extremely fast. The "diffusion mode" toggle is also pretty neat as a visualization, although I'm not sure how accurate it is - it renders as line noise and then refines, while in reality presumably those are tokens from an imprecise vector in some state space that then become more precise until it's only a definite word, right?
replies(3): >>44490131 #>>44490209 #>>44492011 #
icyfox ◴[] No.44492011[source]
Some text diffusion models use continuous latent space but they historically haven't done that well. Most the ones we're seeing now typically are trained to predict actual token output that's fed forward into the next time series. The diffusion property comes from their ability to modify previous timesteps to converge on the final output.

I have an explanation about one of these recent architectures that seems similar to what Mercury is doing under the hood here: https://pierce.dev/notes/how-text-diffusion-works/

replies(1): >>44495285 #
1. chc4 ◴[] No.44495285[source]
Oh neat, thanks! The OP is surprisingly light on details on how it actually works and is mostly benchmarks, so this is very appreciated :)