←back to thread

439 points diggan | 3 comments | | HN request time: 0s | source
Show context
TheRoque ◴[] No.45065446[source]
To be honest, these companies already stole terabytes of data and don't even disclose their dataset, so you have to assume they'll steal and train at anything you throw at them
replies(4): >>45066376 #>>45066970 #>>45068970 #>>45077378 #
marssaxman ◴[] No.45066376[source]
"Reading stuff freely posted on the internet" constitutes stealing now?

Seems like an excessively draconian interpretation of property rights.

replies(10): >>45066424 #>>45066467 #>>45066537 #>>45068095 #>>45068974 #>>45069163 #>>45069363 #>>45069550 #>>45074841 #>>45076689 #
michaelmior ◴[] No.45066424[source]
"Reading stuff freely posted on the internet" is also very different from a business having machines consume large volumes of data posted on the Internet for the purpose of generating value for them without compensating the creators. I'm not making a value judgement one way or the other, but "reading stuff freely posted on the Internet" is an oversimplification.
replies(5): >>45066511 #>>45066562 #>>45068503 #>>45070930 #>>45071058 #
marssaxman ◴[] No.45066511[source]
Okay, but "stealing" is also an oversimplification, to the point of absurdity.

It makes no sense to put stuff up on the internet where it can freely be downloaded by anyone at any time, by people who are then free to do whatever they like with it on their own hardware, then complain that people have downloaded that stuff and done what they liked with it on their own hardware.

"Having machines consume large volumes of data posted on the Internet for the purpose of generating value for them without compensating the creators" is equally a description of Google.

replies(9): >>45066575 #>>45067827 #>>45068034 #>>45068085 #>>45068365 #>>45069767 #>>45070721 #>>45072004 #>>45073608 #
ehnto ◴[] No.45066575[source]
They are not free to do whatever they like, there are tomes of laws across all countries governing what someone can and cannot do with your intellectual property. Just because we didn't have the foresight to add in a "if by chance in the future someone invents artificial intelligence, that's not fair use" is a shame, but doesn't make what these companies are doing ethical or morale.

I don't disagree regarding Google, I also think they exploited others IP for their own gain. It was once symbiotic with webmasters, but when that stopped they broke that implied good faith contract. In a sense, their snippets and widgets using others IP and no longer providing traffic to the site was the warning shot for where we are now. We should have been modernising IP laws back then.

replies(1): >>45067125 #
marssaxman ◴[] No.45067125[source]
I did say "free to do whatever they like on their own hardware", because intellectual property laws generally govern the transfer of such property rather than the use.

After seeing the harm done by the expansion of patent law to cover software algorithms, and the relentless abuse done under the DMCA, I am reflexively skeptical of any effort to expand intellectual property concepts.

replies(1): >>45068489 #
godelski ◴[] No.45068489[source]

  > on their own hardware
That doesn't make it technically legal. That only makes it not worth pursuing. You can sue Joe Schmoe for a million dollars but if he doesn't have that then you're not getting a dime. But if Joe Schmoe is using that thing to make money, well then... yeah you bet your ass that's a different situation and the "worth" of pursuing is directly proportional to how much he is making. Doesn't matter if it is his own hardware or not.

Like why do you think who owns the hardware even matters? Do you really think the legality changes if I rent a GPU vs use my own? That doesn't make any sense.

replies(1): >>45070581 #
marssaxman ◴[] No.45070581[source]
In terms of copyright law, it matters very much whether Joe Schmoe is using his own copy of the data for his own purposes, or whether he is making more copies and distributing them to other people.

If the AI companies were letting people download copies of their training data, copyright law would certainly have something to say about that. But no: once they download the training data, they keep it, and they don't share it.

replies(1): >>45070606 #
godelski ◴[] No.45070606[source]

  > using his own copy of the data
Yes? That is a different thing? I guess we can keep moving the topic until we're talking about the same topic if you want. But honestly, I don't want to have that kind of conversation.
replies(2): >>45070830 #>>45074056 #
marssaxman ◴[] No.45070830[source]
How is it a different thing? Are we talking about copyright law, or not?
replies(1): >>45076444 #
1. godelski ◴[] No.45076444[source]
Before you were talking about data you don't own on hardware you do. Now you're talking about data you do own.

The whole thing is about who owns the data!

replies(1): >>45080495 #
2. marssaxman ◴[] No.45080495[source]
I can imagine that it would indeed be confusing if you failed to distinguish between ownership of the data and ownership of the copyright.
replies(1): >>45080869 #
3. godelski ◴[] No.45080869[source]
Sure... now go back to your edgy comment and keep this in mind to see why everyone is arguing with you

  >>...> To be honest, these companies already stole terabytes of data and don't even disclose their dataset, so you have to assume they'll steal and train at anything you throw at them

   >...> "Reading stuff freely posted on the internet" constitutes stealing now?
Literally everyone was talking about data ownership and you just said "I can download it, so it is fair game on my hardware." Let's say you didn't intend to say that. Well that doesn't matter, that's what a lot of people heard and you failed to clarify when pressed on this.

So yeah, I think you're doing gymnastics

https://news.ycombinator.com/item?id=45066376