(arxiv.org)

568 points PaulHoule | 1 comments | 07 Jul 25 12:31 UTC | HN request time: 0.243s | source

Show context

amelius ◴[07 Jul 25 14:36 UTC] No.44490824[source]▶

Damn, that is fast. But it is faster than I can read, so hopefully they can use that speed and turn it into better quality of the output. Because otherwise, I honestly don't see the advantage, in practical terms, over existing LLMs. It's like having a TV with a 200Hz refresh rate, where 100Hz is just fine.

replies(2): >>44491035 #>>44491573 #

pmxi ◴[07 Jul 25 14:55 UTC] No.44491035[source]▶

>>44490824 #

There are plenty of LLM use cases where the output isn’t meant to be read by a human at all. e.g:

parsing unstructured text into structured formats like JSON

translating between natural or programming languages

serving as a reasoning step in agentic systems

So even if it’s “too fast to read,” that speed can still be useful

replies(2): >>44491329 #>>44495081 #

martinald ◴[07 Jul 25 21:58 UTC] No.44495081[source]▶

>>44491035 #

You're missing another big advantage is cost. If you can do 1000tok/s on a $2/hr H100 vs 60tok/s on the same hardware, you can price it at 1/40th of the price for the same margin.

replies(1): >>44496963 #

sweetjuly ◴[08 Jul 25 04:02 UTC] No.44496963[source]▶

>>44495081 #

You can also slow down the hardware (say, dropping the clock and then voltages) to save huge amounts of power, which should be interesting for embedded applications.

replies(1): >>44499094 #

1. kldg ◴[08 Jul 25 11:43 UTC] No.44499094[source]▶

>>44496963 #

out of curiosity, is anyone here using AI in embedded with experiences to share? I see NPUs and the like popping up more on credit card and buildroot SBCs I get, but with zero documentation or sample scripts for them.

↑

Mercury: Ultra-fast language models based on diffusion