Mercury: Ultra-fast language models based on diffusion

(arxiv.org)

568 points PaulHoule | 2 comments | 07 Jul 25 12:31 UTC | HN request time: 0.423s | source

Show context

gdiamos ◴[07 Jul 25 16:28 UTC] No.44491980[source]▶

I think the LLM dev community is underestimating these models. E.g. there is no LLM inference framework that supports them today.

Yes the diffusion foundation models have higher cross entropy. But diffusion LLMs can also be post trained and aligned, which cuts the gap.

IMO, investing in post training and data is easier than forcing GPU vendors to invest in DRAM to handle large batch sizes and forcing users to figure out how to batch their requests by 100-1000x. It is also purely in the hands of LLM providers.

replies(1): >>44492599 #

1. mathiaspoint ◴[07 Jul 25 17:25 UTC] No.44492599[source]▶

>>44491980 #

You can absolutely tune causal LLMs. In fact the original idea with GPTs was that you had to tune them before they'd be useful for anything.

replies(1): >>44493766 #

2. gdiamos ◴[07 Jul 25 19:23 UTC] No.44493766[source]▶

>>44492599 (TP) #

Yes I agree you can tune autoregressive LLMs

You can also tune diffusion LLMs

After doing so, the diffusion LLM will be able to generate more tokens/sec during inference

↑