←back to thread

296 points todsacerdoti | 1 comments | | HN request time: 0.306s | source
Show context
pona-a ◴[] No.44368824[source]
Didn't tokenization already have one bitter lesson: that it's better to let simple statistics guide the splitting, rather than expert morphology models? Would this technically be a more bitter lesson?
replies(2): >>44369893 #>>44370752 #
1. kingstnap ◴[] No.44370752[source]
Simple statistics aren't some be all. There was a huge improvement in Python coding by fixing the tokenization of indents in Python code.

Specifically they made tokens for 4,8,12,16 or something spaces.