←back to thread

137 points bradt | 2 comments | | HN request time: 0.407s | source
Show context
neehao ◴[] No.45087333[source]
As Tyler Cowen says, solve for the equilibrium.

"Many widely used machine-learning models rely on copyrighted data. For instance, Google finds the most relevant web pages for a search term by relying on a machine learning model trained on copyrighted web data. But the use of copyrighted data by machine learning models that generate content (or give answers to search queries than link to sites with the answers) poses new (reasonable) questions about fair use. By not sharing the proceeds, such systems also kill the incentives to produce original content on which they rely. For instance, if we don’t incentivize content producers, e.g., people who respond to Stack Overflow questions, the ability of these models to answer questions in new areas is likely to be lower. The concern about fair use can be addressed by training on data from content producers who have opted to share their data. The second problem is more challenging. How do you build a system that shares proceeds with content producers?"

https://www.gojiberries.io/generative-ai-and-the-market-for-...

replies(2): >>45087359 #>>45087576 #
1. skybrian ◴[] No.45087576[source]
AI companies are already paying for content to train on.
replies(1): >>45090446 #
2. whimsicalism ◴[] No.45090446[source]
not in any serious manner. they are paying for content that has been successfully walled a la reddit, not all that is copyrighted. its law of the jungle