The bitter lesson is coming for tokenization

(lucalp.dev)

296 points todsacerdoti | 1 comments | 24 Jun 25 14:14 UTC | HN request time: 0.222s | source

Show context

cheesecompiler ◴[24 Jun 25 15:32 UTC] No.44367317[source]▶

The reverse is possible too: throwing massive compute at a problem can mask the existence of a simpler, more general solution. General-purpose methods tend to win out over time—but how can we be sure they’re truly the most general if we commit so hard to one paradigm (e.g. LLMs) that we stop exploring the underlying structure?

replies(4): >>44367776 #>>44367991 #>>44368757 #>>44375546 #

logicchains ◴[24 Jun 25 16:14 UTC] No.44367776[source]▶

>>44367317 #

We can be sure via analysis based on computational theory, e.g. https://arxiv.org/abs/2503.03961 and https://arxiv.org/abs/2310.07923 . This lets us know what classes of problems a model is able to solve, and sufficiently deep transformers with chain of thought have been shown to be theoretically capable of solving a very large class of problems.

replies(4): >>44367856 #>>44367945 #>>44369625 #>>44373799 #

1. yorwba ◴[24 Jun 25 18:57 UTC] No.44369625[source]▶

>>44367776 #

Keep in mind that proofs of transformers being able to solve all problems in some complexity class work by taking a known universal algorithm for that complexity class and encoding it as a transformer. In every such case, you'd be better off using the universal algorithm you started with in the first place.

Maybe the hope is that you won't have to manually map the universal algorithm to your specific problem and can just train the transformer to figure it out instead, but there are few proofs that transformers can solve all problems in some complexity class through training instead of manual construction.

↑