The bitter lesson is coming for tokenization

(lucalp.dev)

296 points todsacerdoti | 1 comments | 24 Jun 25 14:14 UTC | HN request time: 0.276s | source

Show context

marcosdumay ◴[24 Jun 25 16:31 UTC] No.44367966[source]▶

Yeah, make the network deeper.

When all you have is a hammer... It makes a lot of sense that a transformation layer that makes the tokens more semantically relevant will help optimize the entire network after it and increase the effective size of your context window. And one of the main immediate obstacle stopping those models from being intelligent is context window size.

On the other hand, the current models already cost something on the line of the median country GDP to train, and they are nowhere close to that in value. The saying that "if brute force didn't solve your problem, you didn't apply enough force" is intended to be listened as a joke.

replies(4): >>44368591 #>>44368640 #>>44374381 #>>44377110 #

jagraff ◴[24 Jun 25 17:27 UTC] No.44368591[source]▶

>>44367966 #

I think the median country GDP is something like $100 Billion

https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(PPP)

Models are expensive, but they're not that expensive.

replies(5): >>44368728 #>>44368795 #>>44368858 #>>44369293 #>>44371000 #

1. telotortium ◴[24 Jun 25 17:41 UTC] No.44368728[source]▶

>>44368591 #

LLM model training costs arise primarily from commodity costs (GPUs and other compute as well as electricity), not locally-provided services, so PPP is not the right statistic to use here. You should use nominal GDP for this instead. According to Wikipedia[0], the median country's nominal GDP (Cyprus) is more like $39B. Still much larger than training costs, but much lower than your PPP GDP number.

[0] https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nomi...

↑