←back to thread

296 points todsacerdoti | 5 comments | | HN request time: 1.249s | source
Show context
smeeth ◴[] No.44368465[source]
The main limitation of tokenization is actually logical operations, including arithmetic. IIRC most of the poor performance of LLMs for math problems can be attributed to some very strange things that happen when you do math with tokens.

I'd like to see a math/logic bench appear for tokenization schemes that captures this. BPB/perplexity is fine, but its not everything.

replies(6): >>44368862 #>>44369438 #>>44371781 #>>44373480 #>>44374125 #>>44375446 #
cschmidt ◴[] No.44369438[source]
This paper has a good solution:

https://arxiv.org/abs/2402.14903

You right to left tokenize in groups of 3, so 1234567 becomes 1 234 567 rather than the default 123 456 7. And if you ensure all 1-3 digits groups are in the vocab, it does much better.

Both https://arxiv.org/abs/2503.13423 and https://arxiv.org/abs/2504.00178 (co-author) both independently noted that you can do this with just by modifying the pre-tokenization regex, without having to explicitly add commas.

replies(3): >>44372335 #>>44374721 #>>44374882 #
jvanderbot ◴[] No.44372335[source]
Ok great! This is precisely how I chunk numbers for comparison. And not to diminish a solid result or the usefulness of it or the baseline tech: its clear that it we keep having to create situation - specific inputs or processes, we're not at AGI with this baseline tech
replies(1): >>44373437 #
chmod775 ◴[] No.44373437[source]
> [..] we're not at AGI with this baseline tech

DAG architectures fundamentally cannot be AGI and you cannot even use them as a building block for a hypothetical AGI if they're immutable at runtime.

Any time I hear the goal being "AGI" in the context of these LLMs, I feel like listening to a bunch of 18th-century aristocrats trying to get to the moon by growing trees.

Try to create useful approximations using what you have or look for new approaches, but don't waste time on the impossible. There's no iterative improvements here that will get you to AGI.

replies(4): >>44373686 #>>44375069 #>>44376414 #>>44385536 #
mgraczyk ◴[] No.44375069[source]
This is meant to be some kind of Chinese room argument? Surely a 1e18 context window model running at 1e6 tokens per second could be AGI.
replies(3): >>44375232 #>>44375489 #>>44376558 #
1. lukan ◴[] No.44375232[source]
"Surely a 1e18 context window model running at 1e6 tokens per second could be AGI."

And why?

replies(1): >>44379407 #
2. mgraczyk ◴[] No.44379407[source]
Because that's quite a bit more information processing than any human brain
replies(1): >>44379674 #
3. lukan ◴[] No.44379674[source]
I don't think it is quantity that matters. Otherwise supercomputers are smart by definition.
replies(1): >>44379719 #
4. mgraczyk ◴[] No.44379719{3}[source]
Well no, that's not what anyone is saying.

The claim was that it isn't possible in principle for "DAGs" or "immutable architectures" to be intelligent. That statement is confusing some theoretical results that aren't applicable to how LLMs work (output context is mutation).

I'm not claiming that compute makes the m intelligent. I'm pointing out that it is certainly possible, and at that level of compute it should be plausible. Feel free to share any theoretical results you think demonstrate the impossibility of "DAG" intelligence and are applicable

replies(1): >>44385233 #
5. lukan ◴[] No.44385233{4}[source]
I am not saying it is impossible, I am saying it might be possible, but far from plausible with the current approach of LLMs in my experience with them.