/top/
/new/
/best/
/ask/
/show/
/job/
^
slacker news
login
about
←back to thread
The bitter lesson is coming for tokenization
(lucalp.dev)
296 points
todsacerdoti
| 1 comments |
24 Jun 25 14:14 UTC
|
HN request time: 0.306s
|
source
Show context
pona-a
◴[
24 Jun 25 17:48 UTC
]
No.
44368824
[source]
▶
>>44366494 (OP)
#
Didn't tokenization already have one bitter lesson: that it's better to let simple statistics guide the splitting, rather than expert morphology models? Would this technically be a more bitter lesson?
replies(2):
>>44369893
#
>>44370752
#
1.
kingstnap
◴[
24 Jun 25 20:37 UTC
]
No.
44370752
[source]
▶
>>44368824
#
Simple statistics aren't some be all. There was a huge improvement in Python coding by fixing the tokenization of indents in Python code.
Specifically they made tokens for 4,8,12,16 or something spaces.
ID:
GO
↑