That said the hand coded nature of tokenization certainly seems in dire need of a better solution, something that can be learned end to end. And It looks like we are getting closer with every iteration.
That said the hand coded nature of tokenization certainly seems in dire need of a better solution, something that can be learned end to end. And It looks like we are getting closer with every iteration.
> As it's been pointed out countless times - if the trend of ML research could be summarised, it'd be the adherence to The Bitter Lesson - opt for general-purpose methods that leverage large amounts of compute and data over crafted methods by domain experts
But we're only 1 sentence in, and this is already a failure of science communication at several levels.
1. The sentence structure and grammar is simply horrible
2. This is condescending: "pointed out countless times" - has it?
3. The reference to Sutton's essay is oblique, easy to miss
4. Outside of AI circles, "Bitter Lesson" is not very well known. If you didn't already know about it, this doesn't help.
So any system that predicts the optimization with a general solver can scale better than heuristic or constrained space solvers
Up till recently there’s been no general solvers at that scale