(arxiv.org)

566 points PaulHoule | 3 comments | 07 Jul 25 12:31 UTC | HN request time: 0.875s | source

1. thelastbender12 ◴[07 Jul 25 13:53 UTC] No.44490388[source]▶

The speed here is super impressive! I am curious - are there any qualitative ways in which modeling text using diffusion differs from that using autoregressive models? The kind of problems it works better on, creativity, and similar.

replies(1): >>44491026 #

2. orbital-decay ◴[07 Jul 25 14:54 UTC] No.44491026[source]▶

>>44490388 (TP) #

One works in the coarse-to-fine direction, another works start-to-end. Which means different directionality biases, at least. Difference in speed, generalization, etc. is less clear and needs to be proven in practice, as fundamentally they are closer than it seems. Diffusion models have some well-studied shortcuts to trade speed for quality, but nothing stops you from implementing the same for the other type.

replies(1): >>44494575 #

3. ekunazanu ◴[07 Jul 25 20:54 UTC] No.44494575[source]▶

>>44491026 #

I once read that diffusion is essentially just autoregression in the frequency domain. Honestly, that comparison didn’t seem too far off.

↑

Mercury: Ultra-fast language models based on diffusion