←back to thread

290 points nobody9999 | 9 comments | | HN request time: 0.243s | source | bottom
Show context
jawns ◴[] No.45187038[source]
I'm an author, and I've confirmed that 3 of my books are in the 500K dataset.

Thus, I stand to receive about $9,000 as a result of this settlement.

I think that's fair, considering that two of those books received advances under $20K and never earned out. Also, while I'm sure that Anthropic has benefited from training its models on this dataset, that doesn't necessarily mean that those models are a lasting asset.

replies(22): >>45187319 #>>45187366 #>>45187519 #>>45187839 #>>45188602 #>>45189683 #>>45189684 #>>45190184 #>>45190223 #>>45190237 #>>45190555 #>>45190731 #>>45191633 #>>45192016 #>>45192191 #>>45192348 #>>45192404 #>>45192630 #>>45193043 #>>45195516 #>>45201246 #>>45218895 #
tartoran ◴[] No.45187839[source]
> I think that's fair, considering that two of those books received advances under $20K and never earned out.

It may be fair to you but how about other authors? Maybe it's not fair at all to them.

replies(2): >>45187873 #>>45189724 #
terminalshort ◴[] No.45189724[source]
Do they sell their books for more than $3000 per copy? In that case it isn't fair. Otherwise they are getting a windfall because of Anthropic's stupidity in not buying the books.
replies(5): >>45189898 #>>45190191 #>>45190448 #>>45192764 #>>45196449 #
1. giveita ◴[] No.45190448[source]
If I copy your book and sell a million bootleg copies that compete directly with your book is that worth the $30 cover price?

This is what generative AI essentially is.

Maybe the payment should be $500/h (say $5k a page) to cover the cost of preparing a human verified dataset for anthropic.

replies(4): >>45190525 #>>45190663 #>>45191221 #>>45191261 #
2. aeon_ai ◴[] No.45190525[source]
It’s been determined that training is fair use by the same judge - Anthropic did in fact buy copies of books and train on those as well.

Thus the $3k per violation is still punitive at (conservatively) 100x the cost of the book.

Given that it is fair use, Authors do not have rights to restrict training on their works under copyright law alone.

3. terminalshort ◴[] No.45190663[source]
In that case the damages would be $3000 per copy you sold. Distributing copyrighted work is an entirely different category of offense than just simply downloading and consuming. Anthropic didn't distribute any copies, so the damages are limited to the one copy they pirated. That is not remotely what generative AI is, and it's why the judge ruled that it was perfectly legal to feed the books to the model.
4. megaman821 ◴[] No.45191221[source]
I am not sure what types of books you read, but AI has replaced absolutely no books for me.
5. II2II ◴[] No.45191261[source]
The thing is: you aren't distributing copies with generative AI, in any sensible meaning of the word.

Don't get me wrong: I think this is in incredibly bad deal for authors. That said, I would be horrified if it wasn't treated as fair use. It would be incredibly destructive to society since people would try to use such rulings to chissel away at fair use. Imagine schools who had to pay yearly fees to use books. We know they would do that, they already try to do so (single use workbooks, online value added services). Or look at software. It is already going to be problematic for people who use LLMs. It is already problematic due to patents. Now imagine what would happen if reformulating algorithms that you read in a book was not considered as fair use. Or look at books themselves. A huge chunk of non-fiction consists of doing research and re-expressing ideas in non-original terms. Is that fair use? The main difference between that and a generative AI is we can say a machine did it in the case of generative AI, but is that enough to protect fair use in the conventional sense?

replies(2): >>45192178 #>>45192814 #
6. rkagerer ◴[] No.45192178[source]
Imagine schools who had to pay yearly fees to use books

I feel like we aren't far from that. Wouldn't be surprised if new books get published (in whatever medium) that are licensed out instead of sold.

replies(1): >>45197323 #
7. giveita ◴[] No.45192814[source]
This is parallel to mass surveillance. Surveillance is OK (private eye) so dragnetting is also OK as it is just scaled up private detectives. If 1 is OK then 1+1 is OK. And so is by peano, a Googolplex of the OK thing.
replies(1): >>45201095 #
8. II2II ◴[] No.45197323{3}[source]
You just reminded me that it is a thing, or at least was a thing. Around the time that I was leaving the university world publishers were were starting to introduce time limited ebooks. This not only affected the second hand book market, but students who would have their books in the past.
9. rangestransform ◴[] No.45201095{3}[source]
This is unironically what happened with Katz vs. USA wrt. expectation of privacy in public