←back to thread

451 points croes | 1 comments | | HN request time: 0.215s | source
Show context
achrono ◴[] No.43962386[source]
If anyone was skeptical of the US government being deeply entrenched with these companies in letting this blatant violation of the spirit of the law [1] continue, this should hopefully secure the conclusion.

And for the future, here's one heuristic: if there is a profound violation of the law anywhere that (relatively speaking) is ignored or severely downplayed, it is likely that interested parties have arrived at an understanding. Or in other words, a conspiracy.

[1] There are tons of legal arguments on both sides, but for me it is enough to ask: if this is not illegal and is totally fair use (maybe even because, oh no look at what China's doing, etc.), why did they have to resort to & foster piracy in order to obtain this?

replies(2): >>43962442 #>>43962587 #
NitpickLawyer ◴[] No.43962587[source]
> If anyone was skeptical of the US government being deeply entrenched with these companies in letting this blatant violation of the spirit of the law [1] continue, this should hopefully secure the conclusion.

European here, but why do you think this is so clear cut? There are other jurisdictions where training on copyrighted data has already been allowed by law/caselaw (Germany and Japan). Why do you need a conspiracy in the US?

AFAICT the US copyright law deals with direct reproductions of a copyrighted piece of content (and also carves out some leeway with direct reproduction, like fair use). I think we can all agree by now that LLMs don't fully reproduce "letter perfect" content, right? What then is the "spirit" of the law that you think was broken here? Isn't this the definition of "transformative work"?

Of note is also the other big case involving books - the one where google was allowed to process mountains of books, they were sued and allowed to continue. How is scanning & indexing tons of books different than scanning & "training" an LLM?

replies(2): >>43962962 #>>43968263 #
1. AlotOfReading ◴[] No.43962962[source]
Google asserted fair use in that case, which is an admission of (allowed) copyright infringement. They didn't turn books into a "new form", they provided limited excerpts that couldn't replace the original usage and directly incentivized purchases through normal sales channels while also providing new functionality.

Contrast that with AI companies:

They don't necessarily want to assert fair use, the results aren't necessarily publicly accessible, the work used isn't cited, users aren't directed to typical sales channels, and many common usages do meaningfully reduce the market for the original content (e.g. AI summaries for paywalled pages).

It's not obvious to me as a non-lawyer that these situations are analogous, even if there's some superficial similarity.