←back to thread

296 points todsacerdoti | 1 comments | | HN request time: 0.236s | source
Show context
blixt ◴[] No.44370298[source]
I’m starting to think “The Bitter Lesson” is a clever sounding way to give shade to people that failed to nail it on their first attempt. Usually engineers build much more technology than they actually end up needing, then the extras shed off with time and experience (and often you end up building it again from scratch). It’s not clear to me that starting with “just build something that scales with compute” would get you closer to the perfect solution, even if as you get closer to it you do indeed make it possible to throw more compute at it.

That said the hand coded nature of tokenization certainly seems in dire need of a better solution, something that can be learned end to end. And It looks like we are getting closer with every iteration.

replies(3): >>44370492 #>>44370692 #>>44370704 #
QuesnayJr ◴[] No.44370692[source]
I'm starting to think that half the commenters here don't actually know what "The Bitter Lesson" is. It's purely a statement about the history of AI research, in a very short essay by Rich Sutton: http://www.incompleteideas.net/IncIdeas/BitterLesson.html It's not some general statement about software engineering for all domains, but a very specific statement about AI applications. It's an observation that the previous generation's careful algorithmic work to solve an AI problem ends up being obsoleted by this generation's brute force approach using more computing power. It's something that's happened over and over again in AI, and has happened several times even since 2019 when Sutton wrote the essay.
replies(2): >>44370822 #>>44371113 #
1. blixt ◴[] No.44371113[source]
I think most people have read it and agree it makes an astute observation about surviving methods, but my point is that now we use it to complain about new methods that should just skip all that in between stuff so that The Bitter Lesson doesn't come for them. At best you can use it as an inspiration. Anyway, this was mostly a complaint about the use of "The Bitter Lesson" in the context of this article, it still deserves credit for all the great information about tokenization methods and how one evolutionary branch of them is the Byte Latent Transformer.