←back to thread

989 points acomjean | 4 comments | | HN request time: 0.666s | source
Show context
aeon_ai ◴[] No.45143392[source]
To be very clear on this point - this is not related to model training.

It’s important in the fair use assessment to understand that the training itself is fair use, but the pirating of the books is the issue at hand here, and is what Anthropic “whoopsied” into in acquiring the training data.

Buying used copies of books, scanning them, and training on it is fine.

Rainbows End was prescient in many ways.

replies(36): >>45143460 #>>45143461 #>>45143507 #>>45143513 #>>45143567 #>>45143731 #>>45143840 #>>45143861 #>>45144037 #>>45144244 #>>45144321 #>>45144837 #>>45144843 #>>45144845 #>>45144903 #>>45144951 #>>45145884 #>>45145907 #>>45146038 #>>45146135 #>>45146167 #>>45146218 #>>45146268 #>>45146425 #>>45146773 #>>45146935 #>>45147139 #>>45147257 #>>45147558 #>>45147682 #>>45148227 #>>45150324 #>>45150567 #>>45151562 #>>45151934 #>>45153210 #
therobots927 ◴[] No.45143460[source]
It is related to scalable mode training, however. Chopping the spine off books and putting the pages in an automated scanner is not scalable. And don't forget about the cost of 1) finding 2) purchasing 3) processing and 4) recycling that volume of books.
replies(2): >>45143471 #>>45143502 #
Onavo ◴[] No.45143471[source]
> Chopping the spine off books and putting the pages in an automated scanner is not scalable.

That's how Google Books, the Internet Archive, and Amazon (their book preview feature) operated before ebooks were common. It's not scalable-in-a-garage but perfectly scalable for a commercial operation.

replies(5): >>45143632 #>>45143713 #>>45143766 #>>45145028 #>>45148110 #
1. hamdingers ◴[] No.45143766[source]
We hem and haw about metaphorical "book burning" so much we forget that books themselves are not actually precious.

The books that are destroyed in scanning are a small minority compared to the millions discarded by libraries every year for simply being too old or unpopular.

replies(1): >>45144381 #
2. johnnyanmac ◴[] No.45144381[source]
>we forget that books themselves are not actually precious.

Book burnings are symbolic (Unless you're in the world of Fareinheit 451). The real power comes from the political threat, not the fact that paper with words on them is now unreadable.

replies(2): >>45144709 #>>45146869 #
3. wizzwizz4 ◴[] No.45144709[source]
Well, the famous 1933-05-10 book burning did destroy the only copies of a lot of LGBT medical research, and destroying the last copy of various works was a stated intent of Nazi book burnings.
4. heavyset_go ◴[] No.45146869[source]
The real power comes from the purging of knowledge from institutions that can keep that knowledge alive. Facts, ideas and histories can all be incinerated.