(arxiv.org)

1. amelius ◴[07 Jul 25 14:36 UTC] No.44490824[source]▶

Damn, that is fast. But it is faster than I can read, so hopefully they can use that speed and turn it into better quality of the output. Because otherwise, I honestly don't see the advantage, in practical terms, over existing LLMs. It's like having a TV with a 200Hz refresh rate, where 100Hz is just fine.

replies(2): >>44491035 #>>44491573 #

2. pmxi ◴[07 Jul 25 14:55 UTC] No.44491035[source]▶

>>44490824 (TP) #

There are plenty of LLM use cases where the output isn’t meant to be read by a human at all. e.g:

parsing unstructured text into structured formats like JSON

translating between natural or programming languages

serving as a reasoning step in agentic systems

So even if it’s “too fast to read,” that speed can still be useful

replies(2): >>44491329 #>>44495081 #

3. amelius ◴[07 Jul 25 15:23 UTC] No.44491329[source]▶

>>44491035 #

Sure, but I was talking about the chat interface, sorry if that was not clear.

4. Legend2440 ◴[07 Jul 25 15:47 UTC] No.44491573[source]▶

>>44490824 (TP) #

This lets you do more (potentially a lot more) reasoning steps and tool calls before answering.

5. martinald ◴[07 Jul 25 21:58 UTC] No.44495081[source]▶

>>44491035 #

You're missing another big advantage is cost. If you can do 1000tok/s on a $2/hr H100 vs 60tok/s on the same hardware, you can price it at 1/40th of the price for the same margin.

replies(1): >>44496963 #

6. sweetjuly ◴[08 Jul 25 04:02 UTC] No.44496963{3}[source]▶

>>44495081 #

You can also slow down the hardware (say, dropping the clock and then voltages) to save huge amounts of power, which should be interesting for embedded applications.

replies(1): >>44499094 #

7. kldg ◴[08 Jul 25 11:43 UTC] No.44499094{4}[source]▶

>>44496963 #

out of curiosity, is anyone here using AI in embedded with experiences to share? I see NPUs and the like popping up more on credit card and buildroot SBCs I get, but with zero documentation or sample scripts for them.

↑

Mercury: Ultra-fast language models based on diffusion