←back to thread

566 points PaulHoule | 2 comments | | HN request time: 0s | source
Show context
gdiamos ◴[] No.44491980[source]
I think the LLM dev community is underestimating these models. E.g. there is no LLM inference framework that supports them today.

Yes the diffusion foundation models have higher cross entropy. But diffusion LLMs can also be post trained and aligned, which cuts the gap.

IMO, investing in post training and data is easier than forcing GPU vendors to invest in DRAM to handle large batch sizes and forcing users to figure out how to batch their requests by 100-1000x. It is also purely in the hands of LLM providers.

replies(1): >>44492599 #
1. mathiaspoint ◴[] No.44492599[source]
You can absolutely tune causal LLMs. In fact the original idea with GPTs was that you had to tune them before they'd be useful for anything.
replies(1): >>44493766 #
2. gdiamos ◴[] No.44493766[source]
Yes I agree you can tune autoregressive LLMs

You can also tune diffusion LLMs

After doing so, the diffusion LLM will be able to generate more tokens/sec during inference