←back to thread

439 points diggan | 5 comments | | HN request time: 0.355s | source
Show context
TheRoque ◴[] No.45065446[source]
To be honest, these companies already stole terabytes of data and don't even disclose their dataset, so you have to assume they'll steal and train at anything you throw at them
replies(4): >>45066376 #>>45066970 #>>45068970 #>>45077378 #
marssaxman ◴[] No.45066376[source]
"Reading stuff freely posted on the internet" constitutes stealing now?

Seems like an excessively draconian interpretation of property rights.

replies(10): >>45066424 #>>45066467 #>>45066537 #>>45068095 #>>45068974 #>>45069163 #>>45069363 #>>45069550 #>>45074841 #>>45076689 #
michaelmior ◴[] No.45066424[source]
"Reading stuff freely posted on the internet" is also very different from a business having machines consume large volumes of data posted on the Internet for the purpose of generating value for them without compensating the creators. I'm not making a value judgement one way or the other, but "reading stuff freely posted on the Internet" is an oversimplification.
replies(5): >>45066511 #>>45066562 #>>45068503 #>>45070930 #>>45071058 #
1. bdamm ◴[] No.45066562[source]
We didn't seem to mind when Google was doing it back in 1999, or Lycos, Altavista, etc before them... why do we care about the LLM companies doing it now?
replies(2): >>45066668 #>>45066980 #
2. codazoda ◴[] No.45066668[source]
I find LLMs extremely useful but I think the difference is that they regurgitate the content (not verbatim) instead of a link to it. This is not unlike how a human might tell their friend about it.
replies(2): >>45067050 #>>45068274 #
3. nbulka ◴[] No.45066980[source]
Because they have terms of service they have to adhere to. We need laws to be lawful.
4. Nevermark ◴[] No.45067050[source]
> This is not unlike how a human might tell their friend about it.

Is there someone who has read the whole internet? Can we all be there friend?

The entire basis of fair use is scale matters.

5. bdamm ◴[] No.45068274[source]
Google has been regurgitating content right into search results since the very beginning, and they've been providing "synopsis" type of results for over a decade.