The bitter lesson is coming for tokenization

(lucalp.dev)

296 points todsacerdoti | 2 comments | 24 Jun 25 14:14 UTC | HN request time: 0.422s | source

Show context

smeeth ◴[24 Jun 25 17:15 UTC] No.44368465[source]▶

The main limitation of tokenization is actually logical operations, including arithmetic. IIRC most of the poor performance of LLMs for math problems can be attributed to some very strange things that happen when you do math with tokens.

I'd like to see a math/logic bench appear for tokenization schemes that captures this. BPB/perplexity is fine, but its not everything.

replies(6): >>44368862 #>>44369438 #>>44371781 #>>44373480 #>>44374125 #>>44375446 #

cschmidt ◴[24 Jun 25 18:45 UTC] No.44369438[source]▶

>>44368465 #

This paper has a good solution:

https://arxiv.org/abs/2402.14903

You right to left tokenize in groups of 3, so 1234567 becomes 1 234 567 rather than the default 123 456 7. And if you ensure all 1-3 digits groups are in the vocab, it does much better.

Both https://arxiv.org/abs/2503.13423 and https://arxiv.org/abs/2504.00178 (co-author) both independently noted that you can do this with just by modifying the pre-tokenization regex, without having to explicitly add commas.

replies(3): >>44372335 #>>44374721 #>>44374882 #

jvanderbot ◴[24 Jun 25 23:58 UTC] No.44372335[source]▶

>>44369438 #

Ok great! This is precisely how I chunk numbers for comparison. And not to diminish a solid result or the usefulness of it or the baseline tech: its clear that it we keep having to create situation - specific inputs or processes, we're not at AGI with this baseline tech

replies(1): >>44373437 #

chmod775 ◴[25 Jun 25 03:37 UTC] No.44373437[source]▶

>>44372335 #

> [..] we're not at AGI with this baseline tech

DAG architectures fundamentally cannot be AGI and you cannot even use them as a building block for a hypothetical AGI if they're immutable at runtime.

Any time I hear the goal being "AGI" in the context of these LLMs, I feel like listening to a bunch of 18th-century aristocrats trying to get to the moon by growing trees.

Try to create useful approximations using what you have or look for new approaches, but don't waste time on the impossible. There's no iterative improvements here that will get you to AGI.

replies(4): >>44373686 #>>44375069 #>>44376414 #>>44385536 #

1. munksbeer ◴[26 Jun 25 09:01 UTC] No.44385536[source]▶

>>44373437 #

It doesn't feel particularly interesting to keep dismissing "these LLMs" as incapable of reaching AGI.

It feels more interesting to note that this time, it is different. I've been watching the field since the 90s when I first dabbled in crude neural nets. I am informed there was hype before, but in my time I've never seen progress like we've made in the last five years. If you showed it to people from the 90s, it would be mind blowing. And it keeps improving incrementally, and I do not think that is going to stop. The state of AI today is the worst it will ever be (trivially obvious but still capable of shocking me).

What I'm trying to say is that the shocking success of LLMs has become a powerful engine of progress, creating a positive feedback loop that is dramatically increasing investment, attracting top talent, and sharpening the focus of research into the next frontiers of artificial intelligence.

replies(1): >>44387575 #

2. dTal ◴[26 Jun 25 14:07 UTC] No.44387575[source]▶

>>44385536 (TP) #

>If you showed it to people from the 90s, it would be mind blowing

90's? It's mind blowing to me now.

My daily driver laptop is (internally) a Thinkpad T480, a very middle of the road business class laptop from 2018.

It now talks to me. Usually knowledgeably, in a variety of common languages, using software I can download and run for free. It understands human relationships and motivations. It can offer reasonably advice and write simple programs from a description. It notices my tone and tries to adapt its manner.

All of this was inconceivable when I bought the laptop - I would have called it very unrealistic sci-fi. I am trying not to forget that.

↑