←back to thread

566 points PaulHoule | 7 comments | | HN request time: 1.486s | source | bottom
1. amelius ◴[] No.44490824[source]
Damn, that is fast. But it is faster than I can read, so hopefully they can use that speed and turn it into better quality of the output. Because otherwise, I honestly don't see the advantage, in practical terms, over existing LLMs. It's like having a TV with a 200Hz refresh rate, where 100Hz is just fine.
replies(2): >>44491035 #>>44491573 #
2. pmxi ◴[] No.44491035[source]
There are plenty of LLM use cases where the output isn’t meant to be read by a human at all. e.g:

parsing unstructured text into structured formats like JSON

translating between natural or programming languages

serving as a reasoning step in agentic systems

So even if it’s “too fast to read,” that speed can still be useful

replies(2): >>44491329 #>>44495081 #
3. amelius ◴[] No.44491329[source]
Sure, but I was talking about the chat interface, sorry if that was not clear.
4. Legend2440 ◴[] No.44491573[source]
This lets you do more (potentially a lot more) reasoning steps and tool calls before answering.
5. martinald ◴[] No.44495081[source]
You're missing another big advantage is cost. If you can do 1000tok/s on a $2/hr H100 vs 60tok/s on the same hardware, you can price it at 1/40th of the price for the same margin.
replies(1): >>44496963 #
6. sweetjuly ◴[] No.44496963{3}[source]
You can also slow down the hardware (say, dropping the clock and then voltages) to save huge amounts of power, which should be interesting for embedded applications.
replies(1): >>44499094 #
7. kldg ◴[] No.44499094{4}[source]
out of curiosity, is anyone here using AI in embedded with experiences to share? I see NPUs and the like popping up more on credit card and buildroot SBCs I get, but with zero documentation or sample scripts for them.