(old.reddit.com)

439 points diggan | 1 comments | 29 Aug 25 11:39 UTC | HN request time: 0.23s | source

Show context

TheRoque ◴[29 Aug 25 15:33 UTC] No.45065446[source]▶

To be honest, these companies already stole terabytes of data and don't even disclose their dataset, so you have to assume they'll steal and train at anything you throw at them

replies(4): >>45066376 #>>45066970 #>>45068970 #>>45077378 #

marssaxman ◴[29 Aug 25 16:44 UTC] No.45066376[source]▶

>>45065446 #

"Reading stuff freely posted on the internet" constitutes stealing now?

Seems like an excessively draconian interpretation of property rights.

replies(10): >>45066424 #>>45066467 #>>45066537 #>>45068095 #>>45068974 #>>45069163 #>>45069363 #>>45069550 #>>45074841 #>>45076689 #

michaelmior ◴[29 Aug 25 16:47 UTC] No.45066424[source]▶

>>45066376 #

"Reading stuff freely posted on the internet" is also very different from a business having machines consume large volumes of data posted on the Internet for the purpose of generating value for them without compensating the creators. I'm not making a value judgement one way or the other, but "reading stuff freely posted on the Internet" is an oversimplification.

replies(5): >>45066511 #>>45066562 #>>45068503 #>>45070930 #>>45071058 #

bdamm ◴[29 Aug 25 16:57 UTC] No.45066562[source]▶

>>45066424 #

We didn't seem to mind when Google was doing it back in 1999, or Lycos, Altavista, etc before them... why do we care about the LLM companies doing it now?

replies(2): >>45066668 #>>45066980 #

codazoda ◴[29 Aug 25 17:05 UTC] No.45066668[source]▶

>>45066562 #

I find LLMs extremely useful but I think the difference is that they regurgitate the content (not verbatim) instead of a link to it. This is not unlike how a human might tell their friend about it.

replies(2): >>45067050 #>>45068274 #

1. bdamm ◴[29 Aug 25 19:21 UTC] No.45068274[source]▶

>>45066668 #

Google has been regurgitating content right into search results since the very beginning, and they've been providing "synopsis" type of results for over a decade.

↑

If you have a Claude account, they're going to train on your data moving forward