If you have a Claude account, they're going to train on your data moving forward

(old.reddit.com)

439 points diggan | 2 comments | 29 Aug 25 11:39 UTC | HN request time: 0.593s | source

Show context

TheRoque ◴[29 Aug 25 15:33 UTC] No.45065446[source]▶

To be honest, these companies already stole terabytes of data and don't even disclose their dataset, so you have to assume they'll steal and train at anything you throw at them

replies(4): >>45066376 #>>45066970 #>>45068970 #>>45077378 #

marssaxman ◴[29 Aug 25 16:44 UTC] No.45066376[source]▶

>>45065446 #

"Reading stuff freely posted on the internet" constitutes stealing now?

Seems like an excessively draconian interpretation of property rights.

replies(10): >>45066424 #>>45066467 #>>45066537 #>>45068095 #>>45068974 #>>45069163 #>>45069363 #>>45069550 #>>45074841 #>>45076689 #

michaelmior ◴[29 Aug 25 16:47 UTC] No.45066424[source]▶

>>45066376 #

"Reading stuff freely posted on the internet" is also very different from a business having machines consume large volumes of data posted on the Internet for the purpose of generating value for them without compensating the creators. I'm not making a value judgement one way or the other, but "reading stuff freely posted on the Internet" is an oversimplification.

replies(5): >>45066511 #>>45066562 #>>45068503 #>>45070930 #>>45071058 #

marssaxman ◴[29 Aug 25 16:53 UTC] No.45066511[source]▶

>>45066424 #

Okay, but "stealing" is also an oversimplification, to the point of absurdity.

It makes no sense to put stuff up on the internet where it can freely be downloaded by anyone at any time, by people who are then free to do whatever they like with it on their own hardware, then complain that people have downloaded that stuff and done what they liked with it on their own hardware.

"Having machines consume large volumes of data posted on the Internet for the purpose of generating value for them without compensating the creators" is equally a description of Google.

replies(9): >>45066575 #>>45067827 #>>45068034 #>>45068085 #>>45068365 #>>45069767 #>>45070721 #>>45072004 #>>45073608 #

1. pigeons ◴[29 Aug 25 18:39 UTC] No.45067827[source]▶

>>45066511 #

But they didn't only train on information the creators made freely available. They trained on copyrighted materials obtained illicitly.

replies(1): >>45071073 #

2. pigeons ◴[30 Aug 25 01:11 UTC] No.45071073[source]▶

>>45067827 (TP) #

I know we're not supposed to comment about downvotes, but the original comment was talking about "these companies", and none of the information indicating that they, or at the very least Meta, trained on terabytes of books downloaded from zlib and libgen and other torrent sites, is in dispute. So even if you believe that copyright should not exist, I don't see why this is not a valid dispute of the parents argument that they only trained on information creators made freely available.

↑