Anthropic agrees to pay $1.5B to settle lawsuit with book authors

1. pbd ◴[06 Sep 25 04:02 UTC] No.45146495[source]▶

From a systems design perspective, $3,000 per book makes this approach completely unscalable compared to web scraping. It's like choosing between a O(n) and O(n²) algorithm - legally compliant data acquisition has fundamentally different scaling characteristics than the 'move fast and break things' approach most labs took initially.

replies(3): >>45146583 #>>45146703 #>>45150558 #

2. whimsicalism ◴[06 Sep 25 04:20 UTC] No.45146583[source]▶

>>45146495 (TP) #

more of a large difference in constant factor, like a galactic algorithm for data trawling

3. BoorishBears ◴[06 Sep 25 04:50 UTC] No.45146703[source]▶

>>45146495 (TP) #

I don't know if anyone has actually read the article or the ruling, but this is about pirating books.

Anthropic went back and bought->scanned->destroyed physical copies of them afterward... but they pirated them first, and that's what this settlement is about.

The judge also said:

> “The training use was a fair use,” he wrote. “The technology at issue was among the most transformative many of us will see in our lifetimes.”

So you don't need to pay $3,000 per book you train on unless you pirate them.

replies(1): >>45147414 #

4. pbd ◴[06 Sep 25 07:49 UTC] No.45147414[source]▶

>>45146703 #

i agree. this is very gray imo. e.g., books in India have cheap EEE editions compared to the ones in US/Europe. so they can pre-process the data in India & then compile it in US. does that save them from piracy rules & reduces cost as well.

replies(1): >>45147623 #

5. BoorishBears ◴[06 Sep 25 08:36 UTC] No.45147623{3}[source]▶

>>45147414 #

I mean relative to the cost of pre-training, books are going to be cheap even if you buy them in the US (as demonstrated by the fact Anthropic bought them after)

For post-training, other data sources (like human feedback and/or examples) are way more expensive than books

6. thfuran ◴[06 Sep 25 16:16 UTC] No.45150558[source]▶

>>45146495 (TP) #

Isn't a flat price per book quite plainly O(n)? If not, what's n?