←back to thread

566 points PaulHoule | 1 comments | | HN request time: 0.323s | source
Show context
amelius ◴[] No.44490824[source]
Damn, that is fast. But it is faster than I can read, so hopefully they can use that speed and turn it into better quality of the output. Because otherwise, I honestly don't see the advantage, in practical terms, over existing LLMs. It's like having a TV with a 200Hz refresh rate, where 100Hz is just fine.
replies(2): >>44491035 #>>44491573 #
pmxi ◴[] No.44491035[source]
There are plenty of LLM use cases where the output isn’t meant to be read by a human at all. e.g:

parsing unstructured text into structured formats like JSON

translating between natural or programming languages

serving as a reasoning step in agentic systems

So even if it’s “too fast to read,” that speed can still be useful

replies(2): >>44491329 #>>44495081 #
martinald ◴[] No.44495081[source]
You're missing another big advantage is cost. If you can do 1000tok/s on a $2/hr H100 vs 60tok/s on the same hardware, you can price it at 1/40th of the price for the same margin.
replies(1): >>44496963 #
sweetjuly ◴[] No.44496963[source]
You can also slow down the hardware (say, dropping the clock and then voltages) to save huge amounts of power, which should be interesting for embedded applications.
replies(1): >>44499094 #
1. kldg ◴[] No.44499094[source]
out of curiosity, is anyone here using AI in embedded with experiences to share? I see NPUs and the like popping up more on credit card and buildroot SBCs I get, but with zero documentation or sample scripts for them.