←back to thread

56 points trott | 2 comments | | HN request time: 0.696s | source
1. resiros ◴[] No.40720599[source]
Lots of assumption here. First, that we will only be training on text data, if we take into considerations all the videos and audios shared I am quite sure we would have one or two orders of magnitude more of data. Second, that it even matter, there has been some early research showing that training on the right data improves prediction more than training on more data (which intuitively makes sense, training on papers and book is much more useful than training on youtube comments). Additionally, lots of the improvement in quality are because of RLHF, which is basically manual human labeling. And last, my guess is that improvements in architecture are what will unlock the next level of performance, not just scaling.
replies(1): >>40721542 #
2. trott ◴[] No.40721542[source]
> Lots of assumption here. First, that we will only be training on text data, if we take into considerations all the videos and audios shared I am quite sure we would have one or two orders of magnitude more of data.

1GB of text is way more useful for generating text than 1GB of video is.

> training on the right data improves prediction more than training on more data

Books are more useful than Facebook rants. But this is an argument for data scarcity rather than for data abundance.