Seems like an excessively draconian interpretation of property rights.
It makes no sense to put stuff up on the internet where it can freely be downloaded by anyone at any time, by people who are then free to do whatever they like with it on their own hardware, then complain that people have downloaded that stuff and done what they liked with it on their own hardware.
"Having machines consume large volumes of data posted on the Internet for the purpose of generating value for them without compensating the creators" is equally a description of Google.
I don't disagree regarding Google, I also think they exploited others IP for their own gain. It was once symbiotic with webmasters, but when that stopped they broke that implied good faith contract. In a sense, their snippets and widgets using others IP and no longer providing traffic to the site was the warning shot for where we are now. We should have been modernising IP laws back then.
Which is just to point out that the world wide web is not its own jurisdiction, and I believe AI companies are going to be finding that an ongoing problem. Unlike search, there is no symbiosis here, so there is an incentive to sue. The original IP holders do not benefit in any way. Search was different in that way.
After seeing the harm done by the expansion of patent law to cover software algorithms, and the relentless abuse done under the DMCA, I am reflexively skeptical of any effort to expand intellectual property concepts.
Quid pro quo. Those sites also received traffic from the audiences searching using Google. "Without compensation" really only became a thing when Google started adding the inlined cards which distilled the site's content thus obviating the need for a user to visit the aforementioned site.
> where it can freely be downloaded by anyone at any time, by people who are then free to do whatever they like with it on their own hardware
I think you have a strong misunderstanding of the law and the general expectation of others.I'd like to remind you that a lot of celebrities face legal issues for posting photos of themselves. Here's a recent example with Jennifer Lopez[0]. The reason these types of lawsuits are successful is because it is theft of labor. If you hire a professional photographer to take photos of your wedding then the contract is that the photographer is handing over ownership of the photos in exchange of payment. The only difference here is that the photo was taken before a contract was made. The celebrity owns the right to their body and image, but not to the photograph.
Or think about Open Source Software. Just because it is posted on GitHub does not mean you are legally allowed to use it indiscriminately. GitHub has licenses and not all of them are unrestricted. In fact, a repo without a license does not mean unfettered usage. The default is that the repo owner has the copyright[1].
> You're under no obligation to choose a license. However, without a license, the default copyright laws apply, meaning that you retain all rights to your source code and no one may reproduce, distribute, or create derivative works from your work.
A big part of what will make a lawsuit successful or not is if the owner has been deprived of compensation. As in, if you make money off of someone else's work. That's why this has been the key issue in all these AI lawsuits. Where the question is about if the work is transformative or not. All of this is in new legal territory because the laws were not written with this usage in mind. The transformative stuff is because you need to allow for parody or referencing. You don't want a situation where, say... someone including a video of what the president has said to discuss what was said[2]. But this situation is much closer to "Joe stole a book, learned from that book, and made a lot of money through the knowledge that they obtained from this book AND would not have been able to do without the book's help." Just, it's usually easier to go after the theft part of that situation. It's definitely a messy space.But basically, just because a piece of art exists on public property does not mean you have the right to do whatever you want with it.
> is equally a description of Google.
Yes and no. The AI summaries? Yeah. The search engine and linking? No. The latter is a mutually beneficial service. It's one thing to own a taxi service and it is another to offer a taxi service that will walk into a starbucks take a random drink off the counter and deliver it to you. I'm not sure why this is difficult to understand.[0] https://www.bbc.com/news/articles/cx2qqew643go
[1] https://docs.github.com/en/repositories/managing-your-reposi...
Now the AI summaries are a different story. One where there is no quid pro quo either. It's different when that taxi service will also offer the same service as that business. It's VERY different when that taxi service will walk into that business, take their services free of charge[0], and then transfer that to the taxi customer.
[0] Scraping isn't going to offer ad revenues
[Side note] In our analogy the little text below the link it more like the taxi service offering some advertising or some description of the business. Bit more gray here but I think the quid pro quo phrase applies here. Taxi does this to help customer find the right place to go, providing the business more customers. But the taxi isn't (usually) replacing the service itself.
> on their own hardware
That doesn't make it technically legal. That only makes it not worth pursuing. You can sue Joe Schmoe for a million dollars but if he doesn't have that then you're not getting a dime. But if Joe Schmoe is using that thing to make money, well then... yeah you bet your ass that's a different situation and the "worth" of pursuing is directly proportional to how much he is making. Doesn't matter if it is his own hardware or not.Like why do you think who owns the hardware even matters? Do you really think the legality changes if I rent a GPU vs use my own? That doesn't make any sense.
The reference to terabytes of stolen data refers to copyrighted material. I think you know this but chose to frame it as "stuff freely posted on the internet" in order to mislead and strawman the other comment.
What’s the angle that describes this as fair use?
[0] https://www.businessinsider.com/anthropic-cut-pirated-millio...
If the AI companies were letting people download copies of their training data, copyright law would certainly have something to say about that. But no: once they download the training data, they keep it, and they don't share it.
> using his own copy of the data
Yes? That is a different thing? I guess we can keep moving the topic until we're talking about the same topic if you want. But honestly, I don't want to have that kind of conversation.The fact that value is being created is irrelevant. The fact that they are making profit is irrelevant. As is non compensation to creators. There isn't any law being broken. Is there?
Bottom line in real world terms there is no expectation of privacy with a freely open and unrestricted web site. Even if that website said 'you can use this for single use but not mass use' that in itself is not legally or practically enforceable.
Let's take the example of a Christmas light show. The idea might be (in the homeowners mind) that people, families, will drive by in their cars to enjoy the light show (either a single home or the entire street or most of it). They might think 'we don't want buses full of people who paid to ride the bus' coming down the street. Unfortunately there is no way to prevent that (without the city and laws getting involved) and there is nothing wrong with the fact that the people who provide the bus are making money bringing people to see the light show.
...not if you believe in the right of general-purpose computing. If they have the right to read the data, why don't they have a right to program a computer to do it for them?
I think we all agree that they're not the good guys here, but this reasoning in particular is troubling.
Sam Altman and his ilk are exploiting the incredibly slow moving legal system to enrich themselves.
My entire comment was that the entire issue is about data ownership. Doesn't even matter if you have a copy of the data.
It matters how that copy was obtained.
There's no reason to then discuss if your usage violates the terms of a license if you obtained the data illegally. You're already in the illegal territory lol.
Having data != legally having obtained data
>>...> To be honest, these companies already stole terabytes of data and don't even disclose their dataset, so you have to assume they'll steal and train at anything you throw at them
>...> "Reading stuff freely posted on the internet" constitutes stealing now?
Literally everyone was talking about data ownership and you just said "I can download it, so it is fair game on my hardware." Let's say you didn't intend to say that. Well that doesn't matter, that's what a lot of people heard and you failed to clarify when pressed on this.So yeah, I think you're doing gymnastics
It's even worse than that, they don't even legally have to respect it if courts find it to be fair use, and so far they have. If it's fair use to train models on it, your license means nothing.
The only way to "win" is to not publish your code at all, anywhere.