The bitter lesson is coming for tokenization

(lucalp.dev)

296 points todsacerdoti | 3 comments | 24 Jun 25 14:14 UTC | HN request time: 0.446s | source

Show context

blixt ◴[24 Jun 25 19:49 UTC] No.44370298[source]▶

I’m starting to think “The Bitter Lesson” is a clever sounding way to give shade to people that failed to nail it on their first attempt. Usually engineers build much more technology than they actually end up needing, then the extras shed off with time and experience (and often you end up building it again from scratch). It’s not clear to me that starting with “just build something that scales with compute” would get you closer to the perfect solution, even if as you get closer to it you do indeed make it possible to throw more compute at it.

That said the hand coded nature of tokenization certainly seems in dire need of a better solution, something that can be learned end to end. And It looks like we are getting closer with every iteration.

replies(3): >>44370492 #>>44370692 #>>44370704 #

1. jetrink ◴[24 Jun 25 20:31 UTC] No.44370704[source]▶

>>44370298 #

The Bitter Lesson is specifically about AI. The lesson restated is that over the long run, methods that leverage general computation (brute-force search and learning) consistently outperform systems built with extensive human-crafted knowledge. Examples: Chess, Go, speech recognition, computer vision, machine translation, and on and on.

replies(2): >>44371060 #>>44371279 #

2. AndrewKemendo ◴[24 Jun 25 21:05 UTC] No.44371060[source]▶

>>44370704 (TP) #

This is correct however I’d add that it’s not just “AI” colloquially - it’s a statement about any two optimization systems that are trying to scale.

So any system that predicts the optimization with a general solver can scale better than heuristic or constrained space solvers

Up till recently there’s been no general solvers at that scale

3. fiddlerwoaroof ◴[24 Jun 25 21:30 UTC] No.44371279[source]▶

>>44370704 (TP) #

I think it oversimplifies, though and I think it’s shortsighted to underfund the (harder) crafted systems on the basis of this observation because, when you’re limited by scaling, the other research will save you.

↑