←back to thread

361 points mseri | 1 comments | | HN request time: 0s | source
Show context
Oras ◴[] No.46006894[source]
I got excited by reading the article about releasing the training data, went to their HF account to look at the data (dolma3) and first rows? Text scraped from porn websites!

https://huggingface.co/datasets/allenai/dolma3

replies(2): >>46007075 #>>46008470 #
1. logicchains ◴[] No.46007075[source]
Erotic fiction is one of the main use cases of such models.