←back to thread

439 points diggan | 1 comments | | HN request time: 0.23s | source
Show context
TheRoque ◴[] No.45065446[source]
To be honest, these companies already stole terabytes of data and don't even disclose their dataset, so you have to assume they'll steal and train at anything you throw at them
replies(4): >>45066376 #>>45066970 #>>45068970 #>>45077378 #
marssaxman ◴[] No.45066376[source]
"Reading stuff freely posted on the internet" constitutes stealing now?

Seems like an excessively draconian interpretation of property rights.

replies(10): >>45066424 #>>45066467 #>>45066537 #>>45068095 #>>45068974 #>>45069163 #>>45069363 #>>45069550 #>>45074841 #>>45076689 #
michaelmior ◴[] No.45066424[source]
"Reading stuff freely posted on the internet" is also very different from a business having machines consume large volumes of data posted on the Internet for the purpose of generating value for them without compensating the creators. I'm not making a value judgement one way or the other, but "reading stuff freely posted on the Internet" is an oversimplification.
replies(5): >>45066511 #>>45066562 #>>45068503 #>>45070930 #>>45071058 #
bdamm ◴[] No.45066562[source]
We didn't seem to mind when Google was doing it back in 1999, or Lycos, Altavista, etc before them... why do we care about the LLM companies doing it now?
replies(2): >>45066668 #>>45066980 #
codazoda ◴[] No.45066668[source]
I find LLMs extremely useful but I think the difference is that they regurgitate the content (not verbatim) instead of a link to it. This is not unlike how a human might tell their friend about it.
replies(2): >>45067050 #>>45068274 #
1. bdamm ◴[] No.45068274[source]
Google has been regurgitating content right into search results since the very beginning, and they've been providing "synopsis" type of results for over a decade.