The bitter lesson is coming for tokenization

(lucalp.dev)

296 points todsacerdoti | 5 comments | 24 Jun 25 14:14 UTC | HN request time: 0.618s | source

Show context

cheesecompiler ◴[24 Jun 25 15:32 UTC] No.44367317[source]▶

The reverse is possible too: throwing massive compute at a problem can mask the existence of a simpler, more general solution. General-purpose methods tend to win out over time—but how can we be sure they’re truly the most general if we commit so hard to one paradigm (e.g. LLMs) that we stop exploring the underlying structure?

replies(4): >>44367776 #>>44367991 #>>44368757 #>>44375546 #

1. api ◴[24 Jun 25 17:43 UTC] No.44368757[source]▶

>>44367317 #

CS is full of trivial examples of this. You can use an optimized parallel SIMD merge sort to sort a huge list of ten trillion records, or you can sort it just as fast with a bubble sort if you throw more hardware at it.

The real bitter lesson in AI is that we don't really know what we're doing. We're hacking on models looking for architectures that train well but we don't fully understand why they work. Because we don't fully understand it, we can't design anything optimal or know how good a solution can possibly get.

replies(2): >>44369623 #>>44370235 #

2. xg15 ◴[24 Jun 25 18:57 UTC] No.44369623[source]▶

>>44368757 (TP) #

> You can use an optimized parallel SIMD merge sort to sort a huge list of ten trillion records, or you can sort it just as fast with a bubble sort if you throw more hardware at it.

Well, technically, that's not true: The entire idea behind complexity theory is that there are some tasks that you can't throw more hardware at - at least not for interesting problem sizes or remotely feasible amounts of hardware.

I wonder if we'll reach a similar situation in AI where "throw more context/layers/training data at the problem" won't help anymore and people will be forced to care more about understanding again.

replies(2): >>44370087 #>>44370242 #

3. jimbokun ◴[24 Jun 25 19:33 UTC] No.44370087[source]▶

>>44369623 #

And whether that understanding will be done by humans or the AIs themselves.

4. dan-robertson ◴[24 Jun 25 19:43 UTC] No.44370235[source]▶

>>44368757 (TP) #

Do you have a good reference for sims merge sort? The only examples I found are pairwise-merging large numbers of streams but it seems pretty hard to optimise the late steps where you only have a few streams. I guess you can do some binary-search-in-binary-search to change a merge of 2 similarly sized arrays into two merges of similarly sized arrays into sequential outputs and so on.

More precisely, I think producing a good fast merge of ca 5 lists was a problem I didn’t have good answers for but maybe I was too fixated on a streaming solution and didn’t apply enough tricks.

5. svachalek ◴[24 Jun 25 19:44 UTC] No.44370242[source]▶

>>44369623 #

I think it can be argued that ChatGPT 4.5 was that situation.

↑