That said the hand coded nature of tokenization certainly seems in dire need of a better solution, something that can be learned end to end. And It looks like we are getting closer with every iteration.
That said the hand coded nature of tokenization certainly seems in dire need of a better solution, something that can be learned end to end. And It looks like we are getting closer with every iteration.
> As it's been pointed out countless times - if the trend of ML research could be summarised, it'd be the adherence to The Bitter Lesson - opt for general-purpose methods that leverage large amounts of compute and data over crafted methods by domain experts
But we're only 1 sentence in, and this is already a failure of science communication at several levels.
1. The sentence structure and grammar is simply horrible
2. This is condescending: "pointed out countless times" - has it?
3. The reference to Sutton's essay is oblique, easy to miss
4. Outside of AI circles, "Bitter Lesson" is not very well known. If you didn't already know about it, this doesn't help.