←back to thread

425 points karimf | 1 comments | | HN request time: 0.207s | source
Show context
amelius ◴[] No.45655505[source]
> Many LLMs have voice interfaces, but they usually work by transcribing your speech, generating the answer in text, and using text-to-speech to read the response out loud. That’s perfectly fine in many cases (...), but it’s a wrapper, not real speech understanding.

But I can say the same about tokenization. LLMs first convert groups of characters to tokens, then use that to generate tokens, and then convert the tokens back to characters. That's not real understanding! If LLMs are so smart, we should be able to skip the tokenization step.

replies(2): >>45655705 #>>45665961 #
1. Workaccount2 ◴[] No.45655705[source]
Nothing is real understanding because we have no benchmark for understanding because we don't mechanistically know what understanding is. The best we have is people "vibe knowing" a benchmark that they made up on the spot.