←back to thread

321 points denysvitali | 1 comments | | HN request time: 0.199s | source
Show context
cmdrk ◴[] No.45112073[source]
Does their training corpus respect copyrights or do you have to follow their opt out procedure to keep them from consuming your data? Assuming it’s the latter, it’s open-er but still not quite there.
replies(2): >>45142143 #>>45142449 #
1. traspler ◴[] No.45142143[source]
Afaik they respect robots.txt on crawl and later when using the data they re-check the robots.txt and will exclude the data if the new robots.txt was updated to deny access. They have further data filtering bit for that you better check the technical report.