←back to thread

511 points andy99 | 1 comments | | HN request time: 0.252s | source
Show context
WeirderScience ◴[] No.44536327[source]
The open training data is a huge differentiator. Is this the first truly open dataset of this scale? Prior efforts like The Pile were valuable, but had limitations. Curious to see how reproducible the training is.
replies(2): >>44536400 #>>44537249 #
1. evolvedlight ◴[] No.44537249[source]
Yup, it’s not a dataset packaged like you hope for here, as it still contains traditionally copyrighted material