Most active commenters
  • gruez(19)
  • johnnyanmac(12)
  • Zambyte(11)
  • Suppafly(9)
  • 93po(9)
  • SketchySeaBeast(8)
  • Workaccount2(7)
  • wnevets(7)
  • mr_toad(6)
  • (6)

451 points croes | 400 comments | | HN request time: 4.037s | source | bottom
1. andy99 ◴[] No.43962064[source]
Two different issues that while apparently related need separate consideration. Re the copyright finding, does the US copyright office have standing to make such a determination? Presumably not since various claims about AI and copyright are before the courts. Why did they write this finding?
replies(3): >>43962165 #>>43962326 #>>43962443 #
2. kklisura ◴[] No.43962165[source]
> The Office is releasing this pre-publication version of Part 3 in response to congressional inquiries and expressions of interest from stakeholders

They acknowledge the issue is before courts:

> These issues are the subject of intense debate. Dozens of lawsuits are pending in the United States, focusing on the application of copyright’s fair use doctrine. Legislators around the world have proposed or enacted laws regarding the use of copyrighted works in AI training, whether to remove barriers or impose restrictions

Why did they write the finding: I assume it's because it's their responsibility:

> Pursuant to the Register of Copyrights’ statutory responsibility to “[c]onduct studies” and “[a]dvise Congress on national and international issues relating to copyright,”...

All excerpts are from https://www.copyright.gov/ai/Copyright-and-Artificial-Intell...

3. seper8 ◴[] No.43962192[source]
(this is duplicate of https://news.ycombinator.com/item?id=43960518)
4. prvc ◴[] No.43962193[source]
The released draft report seems merely to be a litany of copyright holder complaints repeated verbatim, with little depth of reasoning to support the conclusions it makes.
replies(4): >>43962324 #>>43962424 #>>43962648 #>>43962893 #
5. HenryBemis ◴[] No.43962233[source]
Tyrants & Kings were forced on us and we could only remove with at the cost of blood.

Politicians, they try to crack as fewer eggs as possible, telling us they are our friends, and we believe them. Now then.. some do more good than bad, some do more bad than good. But on the other hand something that is _good for me_ is _bad for you_ and vice versa. Politicians are just the means to move the needle juuuuuuust a little bit, so show a change, but never make a drastic one. The cost of drastic changes is re-election. And this is the bread and butter of politicians (yes, I am over-over-simplifying but this is human history and a lot will be left out in a comment).

replies(2): >>43962422 #>>43964774 #
6. raverbashing ◴[] No.43962324[source]
I don't have much spare sympathy here honestly
7. _heimdall ◴[] No.43962326[source]
Given that the issue at hand is related to potential misuse of copyright protected material, it seems totally reasonable for the copyright office to investigate and potentially act to reconcile the issue.

Sure the courts may find its out of their jurisdiction, but they should act as they see fit and let the courts settle that later.

8. achrono ◴[] No.43962386[source]
If anyone was skeptical of the US government being deeply entrenched with these companies in letting this blatant violation of the spirit of the law [1] continue, this should hopefully secure the conclusion.

And for the future, here's one heuristic: if there is a profound violation of the law anywhere that (relatively speaking) is ignored or severely downplayed, it is likely that interested parties have arrived at an understanding. Or in other words, a conspiracy.

[1] There are tons of legal arguments on both sides, but for me it is enough to ask: if this is not illegal and is totally fair use (maybe even because, oh no look at what China's doing, etc.), why did they have to resort to & foster piracy in order to obtain this?

replies(2): >>43962442 #>>43962587 #
9. bgwalter ◴[] No.43962424[source]
The required reasoning is not very deep though: If an AI reads 100 scientific papers and churns out a new one, it is plagiarism.

If a savant has perfect recall, remembers text perfectly and rearranges that text to create a marginally new text, he'd be sued for breach of copyright.

Only large corporations get away with it.

replies(9): >>43962554 #>>43962560 #>>43962638 #>>43962665 #>>43962744 #>>43962820 #>>43963108 #>>43963228 #>>43963944 #
10. whycome ◴[] No.43962442[source]
What’s your reading of the spirit of the law?
11. bgwalter ◴[] No.43962443[source]
The US Supreme court has complained on multiple occasions that it is forced to do the work of the legislative.

Why could a copyright office not advise the congress/senate to enact a law that forbids copyrighted material to be used in AI training? This is literally the politicians' job.

replies(1): >>43962818 #
12. brador ◴[] No.43962450[source]
Lifetime for human copyright, 20 years for corporate copyright. That’s the golden zone.
replies(2): >>43962626 #>>43962923 #
13. tempeler ◴[] No.43962475[source]
I think, A new chapter is about to begin. It seems that in the future, many IPs will become democratized — in other words, they will become public assets.
replies(7): >>43962496 #>>43962547 #>>43962564 #>>43962738 #>>43963359 #>>43964145 #>>43966881 #
14. ahmeni ◴[] No.43962496[source]
If only there was some sort of term for fake democracy where you're actually just there to plunder resources.
replies(3): >>43962525 #>>43962606 #>>43963080 #
15. throw0101c ◴[] No.43962521[source]
See "Copyright and Artificial Intelligence Part 3: Generative AI Training" (PDF):

* https://www.copyright.gov/ai/Copyright-and-Artificial-Intell...

16. gadders ◴[] No.43962525{3}[source]
Congress? https://www.capitoltrades.com/
17. AlexandrB ◴[] No.43962547[source]
Public assets as long as you pay your monthly ChatGPT bill.
18. scraptor ◴[] No.43962554{3}[source]
Plagiarism is not an issue of copyright law, it's an entirely separate system of rules maintained by academia. The US Copyright Office has no business having opinions about it. If a AI^W human reads 100 papers and then churns out a new one this is usually called research.
replies(5): >>43962756 #>>43962757 #>>43963247 #>>43963863 #>>43966801 #
19. satanfirst ◴[] No.43962560{3}[source]
That's not logical. If the savant has perfect recall and makes minor edits they are like a digital copy and aren't really like a human, neural network or by extension any other ML model that isn't over-fitted.
20. SketchySeaBeast ◴[] No.43962564[source]
"Democratized" as in large corporations are free to ingest the IPs and then reinterpret and censor them before they feed their version back to us, with us never having free access to the original source?
replies(1): >>43963546 #
21. thomastjeffery ◴[] No.43962580[source]
> The remarks about Musk may refer to the billionaire’s recent endorsement of Twitter founder Jack Dorsey’s desire to “Delete all IP law"...

Yes please.

Delete it for everyone, not just these ridiculous autocrats. It's only helping them in the first place!

22. NitpickLawyer ◴[] No.43962587[source]
> If anyone was skeptical of the US government being deeply entrenched with these companies in letting this blatant violation of the spirit of the law [1] continue, this should hopefully secure the conclusion.

European here, but why do you think this is so clear cut? There are other jurisdictions where training on copyrighted data has already been allowed by law/caselaw (Germany and Japan). Why do you need a conspiracy in the US?

AFAICT the US copyright law deals with direct reproductions of a copyrighted piece of content (and also carves out some leeway with direct reproduction, like fair use). I think we can all agree by now that LLMs don't fully reproduce "letter perfect" content, right? What then is the "spirit" of the law that you think was broken here? Isn't this the definition of "transformative work"?

Of note is also the other big case involving books - the one where google was allowed to process mountains of books, they were sued and allowed to continue. How is scanning & indexing tons of books different than scanning & "training" an LLM?

replies(2): >>43962962 #>>43968263 #
23. tempeler ◴[] No.43962606{3}[source]
This idea does not belong to me. If lawmakers and regulators allow companies to use these IPs, how can you keep ordinary people away from them? Something created by AI is regarded as if it was created from scratch by human hands. that's reality.
24. Zambyte ◴[] No.43962626[source]
Zero (0) years for corporate copyright, zero (0) years for human copyright is the golden zone in my book.
replies(2): >>43962681 #>>43963025 #
25. tantalor ◴[] No.43962638{3}[source]
If AI really could "churn out a new scientific paper" we would all be ecstatically rejoicing in the dawning of an age of AGI. We are nowhere near that.
replies(1): >>43962711 #
26. nadermx ◴[] No.43962648[source]
Not only does it read like a litany[0]. It seems like the copyright holders are not happy with how the meta case is working through court and are trying to sidestep fair use entirely.

https://www.copyright.gov/ai/Copyright-and-Artificial-Intell...

replies(1): >>43963315 #
27. glial ◴[] No.43962665{3}[source]
It reminds me of the old joke.

"To steal ideas from one person is plagiarism; to steal from many is research."

replies(1): >>43974206 #
28. umanwizard ◴[] No.43962681{3}[source]
Why?
replies(2): >>43962773 #>>43962937 #
29. viraptor ◴[] No.43962711{4}[source]
We're relatively close already https://openreview.net/pdf?id=12T3Nt22av And we don't need anything even close to AGI to achieve that.
30. numpad0 ◴[] No.43962738[source]
Oh yeah. It's the Cultural Revolution all over again.
31. shkkmo ◴[] No.43962744{3}[source]
> If a savant has perfect recall, remembers text perfectly and rearranges that text to create a marginally new text, he'd be sued for breach of copyright.

Any suits would be based on the degree the marginally new copy was fair use. You wouldn't be able to sue the savant for reading and remembering the text.

Using AI to creat marginally new copies of copyrighted work is ALREADY a violation. We don't need a dramatic expansion of copyright law that says that just giving the savant the book to real is a copyright violation.

Plagarism and copyright are two entirely different things. Plagarism is about citations and intellectual integrity. Copyright is a about protecting economic interests, has nothing to to with intellectual integrity, and isn't resolved by citing the original work. In fact most of the contexts where you would be accused of plagarism, would be places like reporting, criticism, education or research goals make fair use arguments much easier.

32. ta1243 ◴[] No.43962756{4}[source]
Only when those papers are referenced
33. dfxm12 ◴[] No.43962757{4}[source]
Please argue in good faith. A new research paper is obviously materially different from "rearranging that text to create a marginally new text".
replies(2): >>43962849 #>>43962855 #
34. Zambyte ◴[] No.43962773{4}[source]
It took me a while to be convinced that copyright is strictly a bad idea, but these two articles were very convincing to me.

https://drewdevault.com/2020/08/24/Alice-in-Wonderland.html

https://drewdevault.com/2021/12/23/Sustainable-creativity-po...

replies(2): >>43962953 #>>43963511 #
35. 9283409232 ◴[] No.43962818{3}[source]
Part of Congresses power is to defer that agencies it has created. Such as the US Copyright Office.
36. Maxatar ◴[] No.43962820{3}[source]
Plagiarism isn't illegal, has nothing to do with the law.
replies(1): >>43962877 #
37. shkkmo ◴[] No.43962849{5}[source]
The comment is responding to this line:

> If an AI reads 100 scientific papers and churns out a new one, it is plagiarism.

That is a specific claim that is being directly addressed and pretty clearly qualifies as "good faith".

38. int_19h ◴[] No.43962855{5}[source]
"Rearranging text" is not what modern LLMs do though, unless you specifically ask them to.
replies(1): >>43963842 #
39. shkkmo ◴[] No.43962877{4}[source]
Plagarism is often illegal. If you use plagarism to obtain a financial or other benefit, that can be fraud.
replies(1): >>43963390 #
40. GuB-42 ◴[] No.43962923[source]
The issue with lifetime (vs something like lifetime + X years) is that of inheritance.

Assuming you agree with the idea of inheritance, which is another topic, then it is unfair to deny inheritance of intellectual property. For example if your father has built a house, it will be yours when he dies, it won't become a public house. So why would a book your father wrote just before he died become public domain the moment he dies. It is unfair to those doing who are doing intellectual work, especially older people.

If you want short copyright, is would make more sense to make it 20 years, human or corporate, like patents.

replies(3): >>43963056 #>>43963717 #>>43964645 #
41. whamlastxmas ◴[] No.43962937{4}[source]
Because the concept of owning an idea is really gross. Copyright means I can’t write about whatever I want in my own home even if I never distribute it or no one ever sees it. I’m breaking the law by privately writing Harry Potter fanfic in my journal or whatever. Copyright is supposed to be about encouraging intangibles, and the reality is that it only massively stifles it
replies(4): >>43963076 #>>43963326 #>>43963409 #>>43963555 #
42. SketchySeaBeast ◴[] No.43962953{5}[source]
The first article is saying that "Copyright is bad because of corporations", and I can kind of get behind that, especially the very long term copyrights that have lost the intent, but the second article says that artists will be happier without copyright if we just solve capitalism first. I don't know about you, but that reads to me like "If you wish to make an apple pie from scratch you must first invent the universe".

If an artist produces a work they should have the rights to that work. If I self-publish a novel and then penguin decides that novel is really good and they want to publish it, without copyright they'd just do that, totally swamping me with their clout and punishing my ever putting the work out. That's a bad thing.

replies(3): >>43963451 #>>43963590 #>>43963819 #
43. AlotOfReading ◴[] No.43962962{3}[source]
Google asserted fair use in that case, which is an admission of (allowed) copyright infringement. They didn't turn books into a "new form", they provided limited excerpts that couldn't replace the original usage and directly incentivized purchases through normal sales channels while also providing new functionality.

Contrast that with AI companies:

They don't necessarily want to assert fair use, the results aren't necessarily publicly accessible, the work used isn't cited, users aren't directed to typical sales channels, and many common usages do meaningfully reduce the market for the original content (e.g. AI summaries for paywalled pages).

It's not obvious to me as a non-lawyer that these situations are analogous, even if there's some superficial similarity.

44. mattxxx ◴[] No.43962976[source]
Well, firing someone for this is super weird. It seems like an attempt to censor an interpretation of the law that:

1. Criticizes a highly useful technology 2. Matches a potentially-outdated, strict interpretation of copyright law

My opinion: I think using copyrighted data to train models for sure seems classically illegal. Despite that, Humans can read a book, get inspiration, and write a new book and not be litigated against. When I look at the litany of derivative fantasy novels, it's obvious they're not all fully independent works.

Since AI is and will continue to be so useful and transformative, I think we just need to acknowledge that our laws did not accomodate this use-case, then we should change them.

replies(19): >>43963017 #>>43963125 #>>43963168 #>>43963214 #>>43963243 #>>43963311 #>>43963423 #>>43963517 #>>43963612 #>>43963721 #>>43963943 #>>43964079 #>>43964280 #>>43964365 #>>43964448 #>>43964562 #>>43965792 #>>43965920 #>>43976732 #
45. evanjrowley ◴[] No.43962997[source]
If AI companies in the US are penalized for this, then the effect on copyright holders will only be slowed until foriegn AI companies overtake them. In such cases the legal recourse will be much slower and significantly limited.
replies(2): >>43963088 #>>43966816 #
46. madeofpalk ◴[] No.43963017[source]
> Humans can read a book, get inspiration, and write a new book and not be litigated against

Humans get litigated against this all the time. There is such thing as, charitably, being too inspired.

https://en.wikipedia.org/wiki/List_of_songs_subject_to_plagi...

replies(1): >>43963509 #
47. achierius ◴[] No.43963025{3}[source]
Well what we're getting is lifetime for corporate, and zero (0) for human. Hope you're happy.
replies(1): >>43963366 #
48. internet_rand0 ◴[] No.43963029[source]
copyright is long overdue for a total rework

the internet demands it.

the people demand free mega upload for everybody, why? because we can (we seem to NOT want to, but that should be a politically solvable problem)

49. dghlsakjg ◴[] No.43963056{3}[source]
Then make it the greater of 20 years or the lifetime for humans.

Comparing intellectual property to real or physical property makes no sense. Intellectual property is different because it is non exclusive. If you are living in your father’s house, no one else can be living there. If I am reading your fathers book, that has nothing to do with whether anyone else can read the book.

replies(1): >>43964921 #
50. redwall_hp ◴[] No.43963076{5}[source]
Whole genres of music are based entirely on sampling, and they got screwed by copyright law as it evolved over the 90s and 2000s. Now only people with a sufficiently sized business backing them can truly participate, or they're stuck licensing things on Splice.

And that's not even touching the spurious lawsuits about musical similarity. That's what musicians call a genre...

It makes some sense for a very short term literal right to reproduction of a singular work, but any time the concept of derivative works comes into play, it's just a bizarrely dystopian suppression of art, under the supposition that art is commercial activity rather than an innate part of humanity.

51. ◴[] No.43963080{3}[source]
52. mitthrowaway2 ◴[] No.43963088[source]
Access to copyrighted materials might make for slightly better-trained models the way that access to more powerful GPUs does. But I don't think it will accelerate foundational advances in the underlying technology. If anything, maybe having to compete under tight constraints means AI companies will have to innovate more, rather than merely push scale.
replies(2): >>43963872 #>>43969540 #
53. JKCalhoun ◴[] No.43963108{3}[source]
My understanding — LLMs are nothing at all like a "savant with perfect recall".

More like a speed-reader who retains a schema-level grasp of what they’ve read.

54. ActionHank ◴[] No.43963125[source]
Assuming this means copyright is dead, companies will be vary upset and patents will likely follow.

The hold US companies have on the world will be dead too.

I also suspect that media piracy will be labelled as the only reason we need copyright, an existing agency will be bolstered to address this concern and then twisted into a censorship bureau.

55. timdiggerm ◴[] No.43963214[source]
Or we could acknowledge that something could be a bad idea, despite its utility
56. mr_toad ◴[] No.43963228{3}[source]
> If a savant has perfect recall

AI don’t have perfect recall.

57. stevenAthompson ◴[] No.43963243[source]
Doing a cover song requires permission, and doing it without that permission can be illegal. Being inspired by a song to write your own is very legal.

AI is fine as long as the work it generates is substantially new and transformative. If it breaks and starts spitting out other peoples work verbatim (or nearly verbatim) there is a problem.

Yes, I'm aware that machines aren't people and can't be "inspired", but if the functional results are the same the law should be the same. Vaguely defined ideas like your soul or "inspiration" aren't real. The output is real, measurable, and quantifiable and that's how it should be judged.

replies(3): >>43963561 #>>43963629 #>>43964441 #
58. palmotea ◴[] No.43963247{4}[source]
> Plagiarism is not an issue of copyright law, it's an entirely separate system of rules maintained by academia. The US Copyright Office has no business having opinions about it. If a AI^W human reads 100 papers and then churns out a new one this is usually called research.

If you draw a Venn Diagram of plagiarism and copyright violations, there's a big intersection. For example: if I take your paper, scratch off your name, make some minor tweaks, and submit it; I'm guilty of both plagiarism and copyright violation.

59. jeroenhd ◴[] No.43963311[source]
Pirating movies is also useful, because I can watch movies without paying on devices that apps and accounts don't work on.

That doesn't make piracy legal, even though I get a lot of use out of it.

Also, a person isn't a computer so the "but I can read a book and get inspired" argument is complete nonsense.

replies(2): >>43963560 #>>43964460 #
60. mr_toad ◴[] No.43963315{3}[source]
Copywriter holders have always hated fair use, and often like to pretend it doesn’t exist.

The average copywrite holder would like you to think that the law only allows use of their works in ways that they specifically permit, i.e. that which is not explicitly permitted is forbidden.

But the law is largely the reverse; it only denies use of copyright works in certain ways. That which is not specifically forbidden is permitted.

replies(1): >>43964188 #
61. flats ◴[] No.43963326{5}[source]
I don’t believe this is true? I’m pretty sure that you’re prohibited from making money from that fan fiction, not from writing it at all. So I don’t understand the claim that copyright “massively stifles” creativity. There are of course examples of people not being able to make money on specific “ideas” because of copyright laws, but that doesn’t seem to me to be “massively stifling” creativity itself, especially given that it also protects and supports many people generating these ideas. And if we got rid of copyright law, wouldn’t we be in that exact place, where people wouldn’t be allowed to make money off of creative endeavors?

I mean, owning an idea is kinda gross, I agree. I also personally think that owning land is kinda gross. But we live in a capitalist society right now. If we allow AI companies to train LLMs on copyrighted works without paying for that access, we are choosing to reward these companies instead of the humans who created the data upon which these companies are utterly reliant for said LLMs. Sam Altman, Elon Musk, and all the other tech CEOs will benefit in place of all of the artists I love and admire.

That, to me, sucks.

replies(3): >>43963483 #>>43964780 #>>43964830 #
62. kmeisthax ◴[] No.43963359[source]
They aren't going to legalize, say, publishing Mario fangames or whatever. They're just going to make copyright allow AI training, because AI is what the owner class wants. That's not democratizing IP, that's just prejudicial (dis)enforcement against the creative class.
replies(1): >>43963594 #
63. Zambyte ◴[] No.43963366{4}[source]
I'm not, because that's not what I asked for.
64. jobigoud ◴[] No.43963390{5}[source]
That further drives the point that the issue is not what the AI is doing but what people using it are doing.
65. ◴[] No.43963409{5}[source]
66. vessenes ◴[] No.43963423[source]
Thank you - a voice of sanity on this important topic.

I understand people who create IP of any sort being upset that software might be able to recreate their IP or stuff adjacent to it without permission. It could be upsetting. But I don't understand how people jump to "Copyright Violation" for the fact of reading. Or even downloading in bulk. The Copyright controls, and has always controlled, creation and distribution of a work. In the nature even of the notice is embedded the concept that the work will be read.

Reading and summarizing have only ever been controlled in western countries via State's secrets type acts, or alternately, non-disclosure agreements between parties. It's just way, way past reality to claim that we have existing laws to cover AI training ingesting information. Not only do we not, such rules would seem insane if you substitute the word human for "AI" in most of these conversations.

"People should not be allowed to read the book I distributed online if I don't want them to."

"People should not be allowed to write Harry Potter fanfic in my writing style."

"People should not be allowed to get formal art training that involves going to museums and painting copies of famous paintings."

We just will not get to a sensible societal place if the dialogue around these issues has such a low bar for understanding the mechanics, the societal tradeoffs we've made so far, and is able to discuss where we might want to go, and what would be best.

replies(3): >>43963908 #>>43964370 #>>43964770 #
67. Zambyte ◴[] No.43963451{6}[source]
> If an artist produces a work they should have the rights to that work.

That would indeed be nice, but as the article says, that's usually not the case. The rights holder and the author are almost never the same entity in commercial artistic endeavors. I know I'm not the rights holder for my erroneously-considered-art work (software).

> If I self-publish a novel and then penguin decides that novel is really good and they want to publish it, without copyright they'd just do that, totally swamping me with their clout and punishing my ever putting the work out. That's a bad thing.

Why? You created influential art and its influence was spread. Is that not the point of (good) art?

replies(2): >>43963507 #>>43963598 #
68. jobigoud ◴[] No.43963464{3}[source]
We are talking about the rights of the humans training the models and the humans using the models to create new things.

Copyright only comes into play on publication. It's only concerned about publication of the models and publication of works. The machine itself doesn't have agency to publish anything at this point.

replies(5): >>43963564 #>>43964130 #>>43964131 #>>43964631 #>>43965405 #
69. ulbu ◴[] No.43963480{3}[source]
these comparisons of llms with human artists copying are just ridiculous. it’s saying “well humans are allowed to break twigs and damage the planet in various ways, so why not allow building a fucking DEATH STAR”.

abstracting llms from their operators and owners and possible (and probable) ends and the territories they trample upon is nothing short of eye-popping to me. how utterly negligent and disrespectful of fellow people must one be at the heart to give any credence to such arguments

replies(3): >>43964105 #>>43964159 #>>43964449 #
70. Zambyte ◴[] No.43963483{6}[source]
> And if we got rid of copyright law, wouldn’t we be in that exact place, where people wouldn’t be allowed to make money off of creative endeavors?

This is addressed in the second article I linked.

replies(1): >>43966687 #
71. renewiltord ◴[] No.43963485[source]
I wonder when general internet sentiment moved from pro-piracy to IP maximalism. Fascinating shift.
replies(12): >>43963650 #>>43963828 #>>43963903 #>>43964116 #>>43964318 #>>43964668 #>>43964804 #>>43965414 #>>43965458 #>>43966472 #>>43966712 #>>43966821 #
72. SketchySeaBeast ◴[] No.43963507{7}[source]
> The rights holder and the author are almost never the same entity in commercial artistic endeavors.

There's definitely problems with corporatization of ownership of these things, I won't disagree.

> Why? You created influential art and its influence was spread. Is that not the point of (good) art?

Why do we expect artists to be selfless? Do you think Stephen King is still writing only because he loves the art? You don't simply make software because you love it, right? Should people not be able to make money off their effort?

replies(1): >>43964761 #
73. jrajav ◴[] No.43963509{3}[source]
If you follow these cases more closely over time you'll find that they're less an example of humans stealing work from others and more an example of typical human greed and pride. Old, well established musicians arguing that younger musicians stole from them for using a chord progression used in dozens of songs before their own original, or a melody on the pentatonic scale that sounds like many melodies on the pentatonic scale do. It gets ridiculous.

Plus, all art is derivative in some sense, it's almost always just a matter of degree.

replies(2): >>43966745 #>>43969065 #
74. dmonitor ◴[] No.43963511{5}[source]
You need some mechanism in place to prevent any joe schmoe from spinning up FreeSteam and rehosting the whole thing.
replies(3): >>43964024 #>>43964746 #>>43965314 #
75. ceejayoz ◴[] No.43963517[source]
> Despite that, Humans can read a book, get inspiration, and write a new book and not be litigated against.

You're still not gonna be allowed to commercially publish "Hairy Plotter and the Philosophizer's Rock".

replies(2): >>43963660 #>>43966769 #
76. sophrocyne ◴[] No.43963540[source]
The USCO report was flawed, biased, and hypocritical. A pre-publication of this sort is also extremely unusual.

https://chatgptiseatingtheworld.com/2025/05/12/opinion-why-t...

replies(1): >>43963557 #
77. rurban ◴[] No.43963546{3}[source]
"Democratized" in the meaning of fascistoized, right? Laws do not apply to the cartels, military, executive and secret services.
replies(1): >>43964025 #
78. otterley ◴[] No.43963555{5}[source]
Copyright doesn’t protect ideas. It protects expression of those ideas.

Consider how many books exist on how to care for trees. Each one of them has similar ideas, but the way those ideas are expressed differ. Copyright protects the content of the book; it doesn’t protect the ideas of how to care for trees.

replies(1): >>43964697 #
79. ceejayoz ◴[] No.43963557[source]
What in https://chatgptiseatingtheworld.com/about/ says "ah, yes, trustworthy unbiased analysis" to you? Why should I trust this source's opinion?

Pre-publication reports aren't unusual. https://www.federalregister.gov/public-inspection/current

https://www.federalregister.gov/reader-aids/using-federalreg...

> The Federal Register Act requires that the Office of the Federal Register (we) file documents for public inspection at our office in Washington, DC at least one business day before publication in the Federal Register.

80. Workaccount2 ◴[] No.43963560{3}[source]
It's only complete non-sense if you understand how humans learn. Which we don't.

What we do know though is that LLMs, similar to humans, do not directly copy information into their "storage". LLMs, like humans, are pretty lossy with their recall.

Compare this to something like a search indexed database, where the recall of information given to it is perfect.

replies(1): >>43964910 #
81. toast0 ◴[] No.43963561{3}[source]
> Doing a cover song requires permission, and doing it without that permission can be illegal.

I believe cover song licensing is available mechanically; you don't need permission, you just need to follow the procedures including sending the licensing fees to a rights clearing house. Music has a lot of mechanical licenses and clearing houses, as opposed to other categories of works.

replies(1): >>43965692 #
82. MyOutfitIsVague ◴[] No.43963564{4}[source]
It's not only publication, otherwise people wouldn't be able to be successfully sued for downloading and consuming copyrighted content, it would only be the uploaders who get into trouble.
replies(1): >>43963945 #
83. jasonjayr ◴[] No.43963590{6}[source]
But in this idealized copyright-free world, those self-publishing companies could just as easily take Penguin's top sellers and reproduce those.

The thing that'd set apart these companies are the services + quality of their work.

replies(1): >>43963654 #
84. jobigoud ◴[] No.43963594{3}[source]
Millions of pages of fan fic based on existing IP have been written. There is a point where it doesn't really make sense trying to go after individuals especially if they make no money out of it.

If we enter a world where anyone can create a new Mario game and there are thousands of them released on the public web it would be impossible for the rights holders to do anything, and it would be a PR bad move to go after individuals doing it for fun.

replies(2): >>43963708 #>>43966514 #
85. noirscape ◴[] No.43963598{7}[source]
It may surprise you, but artists need to buy things like food, water and pay for their basic necessities like electricity, rent and taxes. Otherwise they die or go bankrupt.

In our current society, that means they need some sort of means to make money from their work. Copyright, at least in theory, exists to incentivize the creation of art by protecting an artists ability to monetize it.

If you abolish copyright today, under our current economic framework, what will happen is that people create less art because it goes from a (semi-)viable career to just being completely worthless to pursue. It's simply not a feasible option unless you fundamentally restructure society (which is a different argument entirely.)

replies(1): >>43964726 #
86. mjburgess ◴[] No.43963629{3}[source]
I fear the lack of our ability to measure your mind might render you without many of the legal or moral protections you imagine you have. But go ahead, tare down the law to whatever inanity can be described by the trivial machines of the world's current popular charlatans. Presumably you weren't using society's presumption of your agency anyway.
replies(1): >>43965409 #
87. aurizon ◴[] No.43963647[source]
Ned Ludd heirs at last win - High Court rules the spinning Jenny IS ILLEGAL!. All machine made cloth and machines must be destroyed. This is the end of the road for all mechanical ways to make cloth. Get naked, boys 'n girls = this will be fun!
88. ronsor ◴[] No.43963650[source]
AI has made people lose their minds and principles. It's fascinating to observe.

In the meantime, I will continue to dislike copyright regardless of the parties involved.

replies(1): >>43967589 #
89. SketchySeaBeast ◴[] No.43963654{7}[source]
Is not part of the quality of the work the contents of the book? What are these companies putting within the pages? We've taken the greatest and longest part of the effort and made it meaningless.
90. WesolyKubeczek ◴[] No.43963660{3}[source]
No, but you are most likely allowed to commercially publish "Hairy Potter and the Philosophizer's Rock", a story about a prehistoric community. The hero is literally a hairy potter who steals a rock from a lazy deadbeat dude who is pestering the rest of the group with his weird ideas.
replies(1): >>43964853 #
91. ChrisArchitect ◴[] No.43963705[source]
[dupe] https://news.ycombinator.com/item?id=43960518
92. int_19h ◴[] No.43963708{4}[source]
Imagine a world where all models capable of creating a new Mario game from scratch are only available through cloud providers which must implement mandatory filters such that asking "write me a Mario clone" (or anything functionally equivalent) gets you a lecture on don't-copy-that-floppy.

Bad PR? The entire copyright enforcement industry has had bad PR pretty much since easy copying enabled grassroots piracy - i.e. since before computers even. It never stopped them. What are you going to do about it? Vote? But all the mainstream parties are onboard with the copyright lobby.

93. ChrisArchitect ◴[] No.43963711[source]
Earlier on the report pdf:

https://news.ycombinator.com/item?id=43955025

94. MyOutfitIsVague ◴[] No.43963717{3}[source]
The issue with that is that inheritance only makes sense for tangible, scarce resources. Having copyright isn't easily analogous to ownership of a physical object, because an object is something you have and if somebody else has it, you can not have and use it.

Copyright is about control. If you know a song and you sing it to yourself, somebody overhears it and starts humming it, they have not deprived you of the ability to still know and sing that song. You can make economic arguments, of deprived profit and financial incentives, and that's fine; I'm not arguing against copyright here (I am not a fan of copyright, it's just not my point at the moment), I'm just saying that inheritance does not naturally apply to copyright, because data and ideas are not scarce, finite goods. They are goods that feasibly everybody in the world can inherit rapidly without lessening the amount that any individual person gets.

If real goods could be freely and easily copied the way data can, we might be having some very interesting debates about the logic and morality of inheriting your parents' house and depriving other people of having a copy.

95. regularjack ◴[] No.43963721[source]
Then they need to be changed for everyone and not just AI companies, but we all know that ain't happening.
96. Workaccount2 ◴[] No.43963737[source]
I have yet to see someone explain in detail how transformer model training works (showing they understand the technical nitty gritty and the overall architecture of transformers) and also layout a case for why it is clearly a violation of copyright.

You can find lots of people talking about training, and you can find lots (way more) of people talking about AI training being a violation of copyright, but you can't find anyone talking about both.

Edit: Let me just clarify that I am talking about training, not inference (output).

replies(10): >>43963777 #>>43963792 #>>43963801 #>>43963816 #>>43963830 #>>43963874 #>>43963886 #>>43963955 #>>43964102 #>>43965360 #
97. stevetron ◴[] No.43963744[source]
It's amazing the amount of bad deeds coming out of the current administration in support of special interests.
98. hatenberg ◴[] No.43963776[source]
Big Tech: We shouldn’t pay, each individual piece of content is worth basically nothing.

Also Big Tech: We added 300.000.000 users worth of GTM because we trained in the 10 specific anime movies of Studio Ghibli and are selling their style.

replies(2): >>43963919 #>>43963951 #
99. anhner ◴[] No.43963777[source]
because people who understand how training works also understand that it's not a violation of copyright...
100. autobodie ◴[] No.43963792[source]
I have yet to see someone explain in detail how writing the same words as another person works (showing they understand the technical nitty gritty and the overall architecture of the human mind) and also layout a case for why it is clearly a violation of copyright. You can find lots of people talking about reading, and you can find lots (way more) of people talking about plagarism being a violation of copyright, but you can't find anyone talking about both.
replies(1): >>43963965 #
101. jsiepkes ◴[] No.43963801[source]
This isn't about training AI on a book, but AI companies never paying for the book at all. As in: They "downloaded the e-book from a warez site" and then used it for training.
replies(1): >>43964081 #
102. jfengel ◴[] No.43963816[source]
I'm not sure I understand your question. It's reasonably clear that transformers get caught reproducing material that they have no right to. The kind of thing that would potentially result in a lawsuit if you did it by hand.

It's less clear whether taking vast amounts of copyrighted material and using it to generate other things rises to the level of copyright violation or not. It's the kind of thing that people would have prevented if it had occurred to them, by writing terms of use that explicitly forbid it. (Which probably means that the Web becomes a much smaller place.)

Your comment seems to suggest that writers and artists have absolutely no conceivable stake in products derived from their work, and that it's purely a misunderstanding on their part. But I'm both a computer scientist and an artist and I don't see how you could reach that conclusion. If my work is not relevant then leave it out.

replies(4): >>43963887 #>>43963911 #>>43964402 #>>43969383 #
103. int_19h ◴[] No.43963819{6}[source]
The problem of "how do artists earn enough money to eat?" is legitimate, but I don't think it's a good idea to solve it by making things that inherently don't work like real property to work like it, just so that we can shove them into the same framework. And this is exactly what copyright does - it takes information, which can be copied essentially for free by its very fundamental nature, and tries to make it scarce through legal means solely so that it can be sold as if it were a real good.

There are two reasons why it's a problem. The first reason is that any such abstraction is leaky, and those leaks are ripe for abuse. For example, in case of copyright on information, we made it behave like physical property for the consumers, but not for the producers (who still only need to expend resources to create a single work from scratch, and then duplicate it for free while still selling each copy for $$$). This means that selling information is much more lucrative than selling physical things, which is a big reason why our economy is so distorted towards the former now - just look at what the most profitable corporations on the market do.

The second reason is that it artificially entrenches capitalism by enmeshing large parts of the economy into those mechanics, even if they aren't naturally a good fit. This then gets used as an argument to prop up the whole arrangement - "we can't change this, it would break too much!".

replies(1): >>43966511 #
104. jagermo ◴[] No.43963825[source]
man, if we just had some napster fanboy in the oval office back then. Lot's of laws would not exist.
105. throwaway1854 ◴[] No.43963828[source]
Apples and oranges - and also I don't know if anyone is really supporting IP maximalism.

IP maximalism is requiring DRM tech in every computer and media-capable device that won't play anything without checking into a central server and also making it illegal to reverse or break that DRM. IP maximalism is extending the current bonkers time interval of copyright (over 100 years) to forever. If AI concerns manage to get this down to a reasonable, modern timeframe it'll be awesome.

Record companies in the 90s tied the noose around their own necks, which is just as well because they're very useless now except for supporting geriatric bands. They should have started selling mp3s for 99 cents in 1997 and maybe they would have made a couple of dollars before their slide into irrelevance.

The specific thing people don't want, which a few weirdos keep pushing, is AI-generated stuff passed off as new creative material. It's fine for fun and games, but no one wants a streaming service of AI-generated music, even if you can't tell it's AI generated. And the minute you think you have that cracked - that an AI can create music/art as good as a human and that humans can't tell, the humans will start making bad music/art in rebellion, and it'll be the cool new thing, and the armies of 10Kw GPUs will be wasting their energy on stuff an 1Mhz 8-bit machine could do in the 80s.

106. dmoy ◴[] No.43963830[source]
Not a ton of expert programmer + copyright lawyers, but I bet they're out there

You can probably find a good number of expert programmer + patent lawyers. And presumably some of those osmose enough copyright knowledge from their coworkers to give a knowledgeable answer.

At the end of the day though, the intersection of both doesn't matter. The lawyers win, so what really matters is who has the pulse on how the Fed Circuit will rule on this

Also in this specific case from the article, it's irrelevant?

107. dfxm12 ◴[] No.43963842{6}[source]
I didn't make this claim. Feel free to bring a cogent argument to a commenter who did.
replies(1): >>43965264 #
108. biophysboy ◴[] No.43963863{4}[source]
Having actually done research and published scientific papers, the key limitation is experimentation. Review papers are useful, and AI is useful, but creating new knowledge is more useful. I haven't had much luck using LLMs to extrapolate well beyond their knowledge domain.
replies(1): >>43966819 #
109. int_19h ◴[] No.43963872{3}[source]
The problem is that regardless of any innovations, scale still matters. If you figure out the technique to, say, make a model that is significantly better given N parameters - where N is just large enough to be the perfect fit for the amount of training data that you have access to - then someone else with access to more data will use the same technique to make a model with >N parameters, and it will be better than yours.
110. nickpsecurity ◴[] No.43963874[source]
I did here with proofs of infingement:

https://gethisword.com/tech/exploringai/

111. belorn ◴[] No.43963886[source]
I would also like to see such explanation, especially one that explains how it differ from regular transformers found in video codecs. Why is a lossy compression a clear violation of copyright, but not a generative AI?
112. gruez ◴[] No.43963887{3}[source]
>I'm not sure I understand your question. It's reasonably clear that transformers get caught reproducing material that they have no right to. The kind of thing that would potentially result in a lawsuit if you did it by hand.

Is that a problem with the tool, or the person using it? A photocopier can copy an entire book verbatim. Should that be illegal? Or is it the problem that the "training" process can produce a model that has the ability to reproduce copyrighted work? If so, what implication does that hold for human learning? Many people can recite an entire song's lyrics from scratch, and reproducing an entire song's lyrics verbatim is probably enough to be considered copyright infringement. Does that mean the process of a human listening to music counts as copyright infringement?

replies(1): >>43964178 #
113. Ukv ◴[] No.43963903[source]
No hard data to back this up, but anecdotally I'd place the AI/copyright sentiment shift around mid-late 2022. DALL-E 2 experimentation (e.g: [0]) in early-mid 2022 seemed to just about sneak by unaffected, receiving similar positive/curious reception to previous trends (TalkToTransformer, ArtBreeder, GPT-3/AI Dungeon, etc.), but then Stable Diffusion bore the full brunt of "machine learning is theft" arguments.

[0]: https://x.com/xkcd/status/1552279517477183488

replies(1): >>43964127 #
114. jasonlotito ◴[] No.43963908{3}[source]
> But I don't understand how people jump to "Copyright Violation" for the fact of reading.

The article specificaly talks about the creation and distribution of a work. Creation and distribution of a work alone is not a copyright violation. However, if you take in input from something you don't own, and genAI outputs something, it could be considered a copyright violation.

Let's make this clear; genAI is not a copyright issue by itself. However, gen AI becomes an issue when you are using as your source stuff you don't have the copyright or license to. So context here is important. If you see people jumping to copyright violation, it's not out of reading alone.

> "People should not be allowed to read the book I distributed online if I don't want them to."

This is already done. It's been done for decades. See any case where content is locked behind an account. Only select people can view the content. The license to use the site limits who or what can use things.

So it's odd you would use "insane" to describe this.

> "People should not be allowed to write Harry Potter fanfic in my writing style."

Yeah, fan fiction is generally not legal. However, there are some cases where fair use covers it. Most cases of fan fiction are allowed because the author allows it. But no, generally, fan fiction is illegal. This is well known in the fan fiction community. Obviously, if you don't distribute it, that's fine. But we aren't talking about non-distribution cases here.

> "People should not be allowed to get formal art training that involves going to museums and painting copies of famous paintings."

Same with fan fiction. If you replicate a copyrighted piece of art, if you distribute it, that's illegal. If you simply do it for practice, that's fine. But no, if you go around replicating a painting and distribute it, that's illegal.

Of course, technically speaking, none of this is what gen AI models are doing.

> We just will not get to a sensible societal place if the dialogue around these issues has such a low bar for understanding the mechanics

I agree. Personifying gen AI is useless. We should stick to the technical aspects of what it's doing, rather than trying to pretend it's doing human things when it's 100% not doing that in any capacity. I mean, that's fine for the the layman, but anyone with any ounce of technical skill knows that's not true.

replies(3): >>43964018 #>>43964393 #>>43964735 #
115. Workaccount2 ◴[] No.43963911{3}[source]
My comment is about training models, not model inference.

Most artists can readily violate copyright, that doesn't me we block them from seeing copyright.

replies(1): >>43963993 #
116. Aerroon ◴[] No.43963919[source]
The funny thing is that style is not copyrightable.
replies(1): >>43964115 #
117. elif ◴[] No.43963920[source]
Intellectual property law is quickly becoming an institution of hegemonic corporate litigation of the spreading of ideas.

If it's illegal to know the entire contents of a book it is arbitrary to what degree you are able to codify that knowing itself into symbols.

If judges are permitted to rule here it is not about reproduction of commercial goods but about control of humanity's collective understanding.

118. zelphirkalt ◴[] No.43963943[source]
The law covers these cases pretty well, it is just that the law has very powerful extremely rich adversaries, whose greed has gotten the better of them again and again. They could use work released sufficiently long ago to be legally available, or they could take work released as creative commons, or they could run a lookup, to make sure to never output verbatim copies of input or outputs, that are within a certain string editing distance, depending on output length, or they could have paid people to reach out to all the people, whose work they are infringing upon. But they didn't do any of that, of course, because they think they are above the law.
replies(2): >>43964164 #>>43964374 #
119. wizee ◴[] No.43963944{3}[source]
Is reading and memorizing a copyrighted text a breach of copyright? I.e. is creating a copy of the text in your mind a breach of copyright or fair fair use? Is it a breach of copyright if a digital “mind” similarly memorizes copyrighted text? Or is it only a breach of copyright to output and publish that memorized text?

What about loosely memorizing the gist of a copyrighted text. Is that a breach or fair use? What if a machine does something similar?

This falls under a rather murky area of the law that is not well defined.

replies(1): >>43964865 #
120. HappMacDonald ◴[] No.43963945{5}[source]
Do you have any links to cases where people were sued for downloading and consuming content without also uploading (eg, bittorent), hosting, sharing the copyrighted works, etc?
replies(2): >>43965951 #>>43966372 #
121. nickpsecurity ◴[] No.43963951[source]
"Pretraining data us worth basically nothing."

(Raises $10 billion based on estimated worth of the resulting models.)

"We can't share the GPT4 prettaining data or weights because they're trade secrets that generate over a billion in revenue for us."

I'll believe they're worth nothing when (a) nobody is buying AI models or (b) AI companies stop using the copyrighted works to train models they sell. So far, it looks like they're lying about the worth of the training data.

122. gitremote ◴[] No.43963955[source]
They never said model training is a violation of copyright. The ruling says model training on copyrighted material for analysis and research is NOT copyright infringement, but the commercial use of the resulting model is:

"When a model is deployed for purposes such as analysis or research… the outputs are unlikely to substitute for expressive works used in training. But making commercial use of vast troves of copyrighted works to produce expressive content that competes with them in existing markets, especially where this is accomplished through illegal access, goes beyond established fair use boundaries."

replies(1): >>43964872 #
123. xhkkffbf ◴[] No.43963965{3}[source]
A big part of copyright law is protecting the market for the original creator. Not guaranteeing them anything. Just preventing someone else from coming along and copying someone else's work in a way that hurts their sales.

While AIs don't reproduce things verbatim like pirates, I can see how they really undermine the market, especially for non-fiction books. If people can get the facts without buying the original book, there's much less incentive for the original author to do the hard research and writing.

124. gitremote ◴[] No.43963993{4}[source]
The judgement was about model inference, not training.
replies(1): >>43964817 #
125. flyingcircus3 ◴[] No.43964010{3}[source]
The very fact that you can bring this tired retort to any argument regardless of context reveals it for what it is: an off ramp to any conversation you have no better argument against.
replies(2): >>43964818 #>>43964889 #
126. Aerroon ◴[] No.43964018{4}[source]
>Yeah, fan fiction is generally not legal. However, there are some cases where fair use covers it.

Which is a clear failure of the copyright system. Millions of people are expanding our cultural artifacts with their own additions, but all of it is illegal, because they haven't waited another 100 years.

People are interested in these pieces of culture, but they're not going to remain interested in them forever. At least not interested enough to make their own contributions.

127. tempeler ◴[] No.43964025{4}[source]
To defend yourself against those who don't play by the rules. it has to be democratized. The world isn’t a fair place.
128. pitaj ◴[] No.43964024{6}[source]
There can be many incentives for people to use official sources: early access, easy updates, live events, etc
replies(2): >>43964798 #>>43965729 #
129. franczesko ◴[] No.43964079[source]
> Piracy refers to the illegal act of copying, distributing, or using copyrighted material without authorization. It can occur in various forms

Professing of IP without a license AND offering it as a model for money doesn't seem like an unknown use-case to me

130. xhkkffbf ◴[] No.43964081{3}[source]
This is what's most offensive about it. At least buy one friggin copy.
131. kranke155 ◴[] No.43964102[source]
It doesn’t matter how they work, it only matters what they do.
132. temporalparts ◴[] No.43964105{4}[source]
The problem isn't that people aren't aware that the scale and magnitude differences are large and significant.

It's that the space of intellectual property LAW does not handle the robust capabilities of LLMs. Legislators NEED to pass laws to reflect the new realities or else all prior case law relies on human analogies which fail in the obvious ways you alluded to.

If there was no law governing the use of death stars and mass murder, and the only legal analogy is to environmental damage, then the only crime the legal system can ascribe is mass environmental damage.

replies(1): >>43964252 #
133. _trampeltier ◴[] No.43964115{3}[source]
Exept it's a rectangle with 4 rounded corner.
replies(1): >>43969560 #
134. vharuck ◴[] No.43964116[source]
Personally, I'd support an alternative to copyright for letting creators earn living expenses while working or in reward for good works. But it's a terrible thing to offer them the copyright system and then ignore it to use the works they hoped could earn money. And to further use those works to make something that will replace a lot of creative positions they've relied on because copyright only pays off after the work's been done.

Maybe the government should set up a fund to pay all the copyright holders whose works were used to train the AI models. And if it's a pain to track down the rights holders, I'll play a tiny violin.

135. renewiltord ◴[] No.43964127{3}[source]
Hmm, "when it got good" then. I think what you're saying makes sense to me.
136. bgwalter ◴[] No.43964130{4}[source]
Does the distinction matter? If humans build a machine that uses so much oxygen that the oxygen levels on earth drop by half, can they say:

"Humans are allowed to breathe, so our machine is too, because it is operated by humans!"

replies(1): >>43964279 #
137. spacemadness ◴[] No.43964131{4}[source]
Sounds like we’re talking about the right of AI company founders and people on HN to acquire wealth from creative works due to some weak argument concerning similarity to the human mind and creation of art. Since we’ve now veered into armchair philosophy territory, I think one could argue that the way human memory works and creates, both physically and mentally, from inspiration is vastly different from how AI works. So saying they’re the same and that’s it is both lazy and takes interesting questions off the table to squash debate.
138. Hoasi ◴[] No.43964145[source]
“We used publicly available data” worked good enough for now. And yet OpenAI just accused China of stealing its content.
139. Intralexical ◴[] No.43964159{4}[source]
It's a very consistently Silicon Valley mindset. Seems like almost every company that makes it big in tech, be it Facebook and Google monetizing our personal data, or Uber and Amazon trampling workers' rights, makes money by reducing people to objects that can be bought and sold, more than almost any other industry. No matter the company, all claimed prosocial intentions are just window dressing to convince us to be on board with our own commodification.

That's also why I'm really not worried about the "AI singularity" folks. The hype is IMO blatantly unsubstantiated by the actual capabilities, but gets pushed anyway only because it speaks to this deep-seated faith held across the industry. "AI" is the culmination of an innate belief that people should be replaceable, fungible, perfectly obedient objects, and such a psychosis blinds decision-makers to its actual limits. Only trouble is whether they have the political power to try to force it anyway.

replies(1): >>43967100 #
140. nadermx ◴[] No.43964164{3}[source]
I'm confused, so you're saying its illegal? Because last I checked it's still in the process of going through the courts. And need we forget that copyright's purpose is to advance the arts and sciences. Fair use is codified into law, which states each case is seen on a use by use basis, hence the litigation to determine if it is in fact, legal.
replies(1): >>43964357 #
141. empath75 ◴[] No.43964178{4}[source]
Let's start with I think a case that everyone agrees with.

If I were to take an image, and compress it or encrypt it, and then show you data file, you would not be able to see the original copyrighted material anywhere in the data.

But if you had the right computer program, you could use it to regenerate the original image flawlessly.

I think most people would easily agree that distributing the encrypted file without permission is still a distribution of a copyrighted work and against the law.

What if you used _lossy_ encryption, and can merely reproduce a poor quality jpeg of the original image? I think still copyright infringement, right?

Would it matter if you distributed it with an executable that only rendered the image non-deterministically? Maybe one out of 10 times? Or if the command to reproduce it was undocumented?

Okay, so now we have AI. We can ignore the algorithm entirely and how it works, because it's not relevant. There is a large amount of data that it operates on, the weights of the model and so on. You _can_ with the correct prompts, sometimes generate a copy of a copyrighted work, to some degree of fidelity or another.

I do not think it is meaningfully different from the simpler example, just with a lot of extra steps.

I think, legally, it's pretty clear that it is illegally distributing copyrighted material without permission. I think calling it an "ai" just needlessly anthropomorphizes everything. It's a computer program that distributes copyrighted work without permission. It doesn't matter if it's the primary purpose or not.

I think probably there needs to be some kind of new law to fix this situation, but under the current law as it exists, it seems to me to be clearly illegal.

replies(4): >>43964545 #>>43964933 #>>43965230 #>>43969413 #
142. ls612 ◴[] No.43964188{4}[source]
That used to be how it worked. Then the DMCA 1201 provisions arrived and so now anything not expressly permitted by the enumerated exceptions is forbidden. Even talking about how it works is punishable as a felony (upheld by SCOTUS in like 2000 or 2001, they basically said the Copyright clause is in the constitution so the government can censor information on how to defeat DRM).
replies(1): >>43964771 #
143. Intralexical ◴[] No.43964252{5}[source]
Why do you think the obvious analogy is LLM=Human, and not LLM=JPEG or LLM=database?

I think you're overstating the legal uniqueness of LLMs. They're covered just fine by the existing legal precedents around copyrighted and derived works, just as building a death star would be covered by existing rules around outer space use and WMDs. Pretending they should be treated differently is IMO the entire lie told by the "AI" companies about copyright.

replies(2): >>43964507 #>>43968544 #
144. TeMPOraL ◴[] No.43964279{5}[source]
Yes, and then the response would be, "what have you done, we now need to pass laws about oxygen consumption where before we didn't".

Point being, laws aren't some God-ordained rules, beautiful in their fractal recursive abstraction, perfectly covering everything that will ever happen in the universe. No, laws are more or less crude hacks that deal with here and now. Intellectual property rights were questionable from the start and only got worse; they've been barely keeping up with digital media in the past couple decades, and they're entirely ill-equipped to deal with generative AI. This is a new situation, and laws need to be updated to cover it.

replies(1): >>43964747 #
145. ◴[] No.43964280[source]
146. Intralexical ◴[] No.43964305{3}[source]
> The fatal flaw in your reasoning: machines aren't humans. You can't reason that a machine has rights from the fact a human has them. Otherwise it's murder to recycle a car.

The direction we're going, it seems more likely it'll be recycling to murder a human.

147. bgwalter ◴[] No.43964318[source]
That is fairly easy to answer: When the infringement shifted from small people taking from Walt Disney to Silicon Valley taking from everyone, including open source authors and small YouTube channels.

I find the shift of some right wing politicians and companies from "TPB and megaupload are criminals and its owners must be extradited from foreign countries!" to "Information wants to be free!" much more illuminating.

148. mdhb ◴[] No.43964357{4}[source]
It’s so fucking obviously illegal when you think about it rationally for more than a few seconds. We aren’t even talking about “fair use” we are talking about how it works in practice which was Meta torrenting pirated books, never paying anyone a cent and straight up stealing the content at scale.
replies(2): >>43964700 #>>43964716 #
149. jhaile ◴[] No.43964361[source]
One aspect that I feel is ignored by the comments here is the geo-political forces at work. If the US takes the position that LLMs can't use copyrighted work or has to compensate all copyright holders – other countries (e.g. China) will not follow suit. This will mean that US LLM companies will either fall behind or be too expensive. Which means China and other countries will probably surge ahead in AI, at least in terms of how useful the AI is.

That is not to say that we shouldn't do the right thing regardless, but I do think there is a feeling of "who is going to rule the world in the future?" tha underlies governmental decision-making on how much to regulate AI.

replies(10): >>43964511 #>>43964513 #>>43964544 #>>43964546 #>>43964647 #>>43964799 #>>43965877 #>>43966756 #>>43969913 #>>43974233 #
150. apercu ◴[] No.43964365[source]
>Despite that, Humans can read a book, get inspiration, and write a new book and not be litigated against.

Corporations are not humans. (It's ridiculous that they have some legal protections in the US like humans, but that's a different issue). AI is also not human. AI is also not a chipmunk.

Why the comparison?

151. datavirtue ◴[] No.43964370{3}[source]
Exactly, it is an immense privilege to have your works preserved and promulgated through the ages for instant recall and automated publishing. It's literally what everyone wants. The creators and the consumers. The AI companies are not robbing your money or IP. Period.
152. ashoeafoot ◴[] No.43964374{3}[source]
Obviously a revenue tracking weight should be trained in allowing the tracking and collection of all values generated from derivative works.
153. datavirtue ◴[] No.43964393{4}[source]
"However, gen AI becomes an issue when you are using as your source stuff you don't have the copyright or license to."

Absolute horse shit. I can start a 1-900 answer line and use any reference I want to answer your question.

replies(1): >>43964814 #
154. tensor ◴[] No.43964402{3}[source]
If I write a math book, and you read it, then tell someone about the math within it. You are not violating copyright. In fact, you could write your OWN math book, or history book, or whatever, and as long as you're not copying my actual text, you are not violating copyright.

However, when an LLM does the same, people now what it to be illegal. It seems pretty straightforward to apply existing copyright law to LLMs in the same way we apply them to humans. If the actual text they generate is substantially similar to a source material that it would constitute a copyright violation if a human were to have done it, then it should be illegal. Otherwise it should not.

edit: and in fact it's not even whether an LLM reproduces text, it's wether someone subsequently publishes that text. The person publishing that text should be the one taking on the legal hit.

replies(1): >>43965025 #
155. datavirtue ◴[] No.43964441{3}[source]
"If it breaks and starts spitting out other peoples work verbatim (or nearly verbatim) there is a problem."

Why is that? Seems all logic gets thrown out the window when invoking AI around here. References are given. If the user publishes the output without attribution, NOW you have a problem. People are being so rabid and unreasonable here. Totally bat shit.

replies(1): >>43965672 #
156. SilasX ◴[] No.43964448[source]
>My opinion: I think using copyrighted data to train models for sure seems classically illegal. Despite that, Humans can read a book, get inspiration, and write a new book and not be litigated against. When I look at the litany of derivative fantasy novels, it's obvious they're not all fully independent works.

Huh? If you agree that "learning from copyrighted works to make new ones" has traditionally not been considered infringement, then can you elaborate on why you think it fundamentally changes when you do it with bots? That would, if anything, seem to be a reversal of classic copyright jurisprudence. Up until 2022, pretty much everyone agreed that "learning from copyrighted works to make new ones" is exactly how it's supposed to work, and would be horrified at the idea of having to separately license that.

Sure, some fundamental dynamic might change when you do it with bots, but you need to make that case in an enforceable, operationalized way.

157. staticman2 ◴[] No.43964449{4}[source]
> these comparisons of llms with human artists copying are just ridiculous.

I've come to think of this as the "Performatively failing to recognize the difference between an organism and a machine" rhetorical device that people employ here and elsewhere.

The person making the argument is capable of distinguishing the two things, they just performatively choose not to do so.

replies(1): >>43967517 #
158. datavirtue ◴[] No.43964460{3}[source]
And everyone here is downloading every show and movie in existence without even a hint of guilt.
replies(1): >>43968362 #
159. sdenton4 ◴[] No.43964507{6}[source]
LLMs are certainly not a jpeg or a database...

The google news snippets case is, in my non-lawyer opinion, the most obvious touch point. And in that case, it was decided that providing large numbers of snippets in search results was non-infringing, despite being a case of copying text from other people at-scale... And the reasons this was decided are worth reading and internalizing.

There is not an obvious right answer here. Copyright rules are, in fact, Calvinball, and we're deep in uncharted territory.

replies(1): >>43964597 #
160. bgwalter ◴[] No.43964511[source]
The same president that is putting 145% tariffs on China could put 1000% tariffs on Internet chat bots located in China. Or order the Internet cables to be cut as a last resort (citing a national emergency as is the new practice).

I'm not sure at all what China will do. I find it likely that they'll forbid AI at least for minors so that they do not become less intelligent.

Military applications are another matter that are not really related to these copyright issues.

replies(2): >>43964580 #>>43965149 #
161. asddubs ◴[] No.43964513[source]
you could apply that same logic to any IP breaches though, not just AI
replies(1): >>43965586 #
162. Molitor5901 ◴[] No.43964521[source]
Representative Joe Morelle (D-NY), wrote the termination was “…surely no coincidence he acted less than a day after she refused to rubber-stamp Elon Musk’s efforts to mine troves of copyrighted works to train AI models.”

Interesting, but everyone is mining copyrighted works to train AI models.

163. therouwboat ◴[] No.43964544[source]
If AI is so important, maybe it should be owned by the government and free to use for all citizens.
replies(1): >>43964591 #
164. gruez ◴[] No.43964545{5}[source]
>Okay, so now we have AI. We can ignore the algorithm entirely and how it works, because it's not relevant. There is a large amount of data that it operates on, the weights of the model and so on. You _can_ with the correct prompts, sometimes generate a copy of a copyrighted work, to some degree of fidelity or another.

Suppose we accept all of the above. What does that hold for human learning?

replies(1): >>43965098 #
165. bigbuppo ◴[] No.43964546[source]
The real problem here is that AI companies aren't even willing to follow the norms of big business and get the laws changed to meet their needs.
replies(1): >>43968993 #
166. bitfilped ◴[] No.43964562[source]
Sorry but AI isn't that useful and I don't see it becoming any more useful in the near term. It's taken since ~1950 to get LLMs working well enough to become popular and they still don't work well.
167. pc86 ◴[] No.43964580{3}[source]
How exactly does one add a tariff to a foreign-based chat bot?
replies(2): >>43964611 #>>43965573 #
168. pc86 ◴[] No.43964591{3}[source]
Name two non-military things that the government owns and aren't complete dumpster fires that barely do the thing they're supposed to do.

Even (especially?) the military is a dumpster fire but it's at least very good at doing what it exists to do.

replies(10): >>43964650 #>>43964655 #>>43964684 #>>43964718 #>>43964753 #>>43964773 #>>43964792 #>>43964900 #>>43965196 #>>43969002 #
169. Intralexical ◴[] No.43964597{7}[source]
> LLMs are certainly not a jpeg or a database...

Their weights are derived from copyrighted works. Evaluating them preserves the semantic meaning and character of the source material. And the output directly competes against the copyrighted source materials.

The fact they're smudgy and non-deterministic doesn't change how they relate to the rights of authors and artists.

replies(3): >>43964975 #>>43967423 #>>43967466 #
170. bilbo0s ◴[] No.43964611{4}[source]
You know that 20 bucks a month a lot of people pay for chatgpt?

Yeah..

you tax it if the "chatgpt" is foreign.

171. palmotea ◴[] No.43964631{4}[source]
>>> Despite that, Humans can read a book, get inspiration, and write a new book and not be litigated against.

>> The fatal flaw in your reasoning: machines aren't humans. You can't reason that a machine has rights from the fact a human has them. Otherwise it's murder to recycle a car.

> We are talking about the rights of the humans training the models and the humans using the models to create new things.

Then that's even easier, because that prevents appeals to things humans do, like learning, from muddying the waters.

If "training the models" entails loading up copyrighted works into your system (e.g. encoded them during training), you've just copied them into a retrieval system and violated copyright based on established precedent. And people have prompted verbatim copyrighted text out of well-known LLMs, which makes it even clearer.

And then to defend LLM training you're left with BS akin to claiming an ASCII encoded copy of a book not a copyright violation, because the book is paper and ASCII is numbers.

172. Ekaros ◴[] No.43964645{3}[source]
20 or 25 years from publication. Enough for anyone inhering it to exploit if they are children. No need to have more. It is not like house builder keeps getting paid after house has been build.
173. oooyay ◴[] No.43964647[source]
Well hell, by that logic average citizens should be able to launder corporate intellectual property because China will never follow suit in adhering to intellectual property law. I'm game if you are.
replies(3): >>43964701 #>>43965219 #>>43969949 #
174. bilbo0s ◴[] No.43964650{4}[source]
That's a trick question.

I mean, name 2 things anyone owns that aren't dumpster fires?

Long time ago industrial engineers used to say, "Even Toyota has recalls."

Something being a dumpster fire is so common nowadays that you really need a better reason to argue in support of a given entity's ownership. (Or even non-ownership for that matter.)

175. pergadad ◴[] No.43964655{4}[source]
The government doesn't make tanks, it just shells out gigantic amounts to companies to make them.

That said, there are plenty of successful government actions across the world, where Europe or Japan probably have a good advantage with solid public services. Think streets, healthcare, energy infrastructure, water infrastructure, rail, ...

replies(1): >>43965711 #
176. Ekaros ◴[] No.43964668[source]
Not having massively overfunded corporations exploit artists is not IP minimalism. Private persons stealing something is seen as tiny evil. But big corporation exploiting everyone else is entirely different thing.
replies(1): >>43969223 #
177. lappet ◴[] No.43964684{4}[source]
Highways
178. 93po ◴[] No.43964697{6}[source]
Disney has a copyright over Moana. I would argue Moana is an idea in the sense that most people think of as ideas. Moana isn't tangle, it's not a physical good. It's not a plate on my table. It only exists in our heads. If I made a Moana comic book, with an entirely original storyline and original art and it was all drawn in my own style and not using 3D assets similar to their movies, that is violating copyright. Moana is an idea and there are a million ways to express the established character Moana, and Moana itself is an idea built on a million things that Disney doesn't have any rights to - history, culture, tropes, etc.

I understand what you're saying but the way you're framing it isn't what I really have a problem with. I still don't agree with the idea that I can't make my own physical copies of Harry Potters books, identical word for word. I think people can choose to buy the physical books from the original publisher because they want to support them or like the idea that it's the "true" physical copy. And I'm going to push back on that a million times less than the concept of things like Moana comic books. But still, it's infringing copyright for me to make Moana comic books in my own home, in private, and never showing them to anyone. And that's ridiculous.

replies(2): >>43966978 #>>43967451 #
179. Intralexical ◴[] No.43964700{5}[source]
A test to apply here: If you or I did this, would it be illegal? Would we even be having this conversation?

The law is supposed to be impartial. So if the answer is different, then it's not really a law problem we're talking about.

180. jowea ◴[] No.43964701{3}[source]
Isn't that sort of logic precisely why China doesn't adhere to IP law?
replies(1): >>43964790 #
181. nadermx ◴[] No.43964716{5}[source]
The fact you are even using the word stealing, is telling to your lack of knowledge in this field. Copyright infringement is not stealing[0]. The propaganda of the copyright cartel has gotten to you.

[0] https://en.wikipedia.org/wiki/Dowling_v._United_States_(1985...

replies(4): >>43965685 #>>43966032 #>>43969091 #>>43983233 #
182. sklargh ◴[] No.43964718{4}[source]
Hi. Assuming the US here. Depends on scope of analysis and dumpster fire definition.

1. The National Weather Service. Crown jewel and very effective at predicting the weather and forecasting life threatening events.

2. IRS, generally very good at collecting revenue. 3. National Interagency Fire Service / US Forest service tactical fire suppression

4. NTSB/US Chemicals Safety Board - Both highly regarded.

5. Medicare - Basically clung to with talons by seniors, revealed preference is that they love it.

6. DOE National Labs

7. NIH (spicy pick)

8. Highway System

There are valid critiques of all of these but I don’t think any of them could be universally categorized as a complete dumpster fire.

183. Zambyte ◴[] No.43964726{8}[source]
Amazing. Have you considered reading the articles I linked? They aren't even that long.
replies(1): >>43974756 #
184. vessenes ◴[] No.43964735{4}[source]
> Let's make this clear; genAI is not a copyright issue by itself. However, gen AI becomes an issue when you are using as your source stuff you don't have the copyright or license to. So context here is important. If you see people jumping to copyright violation, it's not out of reading alone.

My proposal is that it's a luddish kneejerk reaction to things people don't understand and don't like. They sense and fear change. For instance here you say it's an issue when AI uses something as a source that you don't have Copyright to. Allow me to update your sentence: "Every paper every scientist or academic wrote that references any copyrighted work becomes an issue". What you said just isn't true. The copyright refers to the right to copy a work.

Distribution: Sure. License your content however you want. That said, in the US a license prohibiting you from READING something just wouldn't be possible. You can limit distribution, copying, etc. This is how journalists can write about sneak previews or leaked information or misfiled court documents released when they should be under seal. The leaking <-- the distribution might violate a contract or a license, but the reading thereof is really not a thing that US law or Common law think they have a right to control, except in the case of the state classifying secrets. As well, here we have people saying "my song in 1983 that I put out on the radio, I don't want AI listening to that song." Did your license in 1983 prohibit computers from processing your song? Does that mean digital radio can't send it out? Essentially that ship has sailed, full stop, without new legislation.

On my last points, I think you're missing my point, Fan fiction is legal if you're not trying to profit from it. It is almost impossible to perfectly copy a painting, although some people are pretty good at it. I think it's perfectly legal to paint a super close copy of say Starry Night, and sell it as "Starry night by Jason Lotito." In any event, the discourse right now claims its wrong for AI to look at and learn from paintings and photographs.

replies(1): >>43964908 #
185. ◴[] No.43964746{6}[source]
186. palmotea ◴[] No.43964747{6}[source]
> Yes, and then the response would be, "what have you done, we now need to pass laws about oxygen consumption where before we didn't".

Except in this case, we already have the equivalent of "laws about oxygen consumption": copyright.

> Intellectual property rights were questionable from the start and only got worse; they've been barely keeping up with digital media in the past couple decades, and they're entirely ill-equipped to deal with generative AI.

The laws are not "entirely ill-equipped to deal with generative AI," unless your interests lie in breaking them. All the hand-waving about the laws being "questionable" and "entirely ill-equipped" is just noise.

Under current law OpenAI, Google, etc. have no right to cheap training data, because someone made that data and may have the reasonable interest in getting paid for their efforts. Like all businesses, those companies would ideally like the law to be unfairly biased towards them: to protect them when they charge as much as they can, but not protect anyone else so they can pay as little as possible.

replies(3): >>43965500 #>>43965515 #>>43967544 #
187. nilamo ◴[] No.43964753{4}[source]
1) art museums, specifically the Smithsonian, but nearly every major city has a decent one.

2) state parks are pretty rad.

replies(1): >>43965010 #
188. Zambyte ◴[] No.43964761{8}[source]
> You don't simply make software because you love it, right?

I can't speak for Stephen but I absolutely do. I program for fun all the time.

> Should people not be able to make money off their effort?

Is anyone arguing otherwise?

replies(1): >>43965549 #
189. caconym_ ◴[] No.43964770{3}[source]
If it was as obvious as you claim, the legal issues would already be settled, and your characterization of what LLMs are doing as "reading and summarizing" is hilariously disingenuous and ignores essentially the entire substance of the debate (which is happening not just on internet forums but in real courts, where real legal professionals and scholars are grappling with how to fit AI into our framework of existing copyright law, e.g.^[1]).

Of course, if you start your thought by dismissing anybody who doesn't share your position as not sane, it's easy to see how you could fail to capture any of that.

^[1] https://arstechnica.com/tech-policy/2025/05/judge-on-metas-a...

190. nadermx ◴[] No.43964771{5}[source]
Breaking DRM, is in fact, Fair Use: https://www.ca5.uscourts.gov/opinions/pub/08/08-10521-CV0.wp...
191. azemetre ◴[] No.43964773{4}[source]
Medicaid, Medicare, and Social Security are all three programs that have massive approval from US citizens.

Even saying the military is a dumpster fire isn't accurate. The military has led trillions of dollars worth of extraction for the wealthy and elite across the globe.

In no sane world can you say that the ability to protect GLOBAL shipping lanes as a failure. That one service alone has probably paid for itself thousands of times.

We aren't even talking about things like public education (high school education use to be privatized and something only the elites enjoyed 100 years ago; yes public high school education isn't even 100 years old) or libraries or public parks.

---

I really don't understand this "gobermint iz bad" meme you see in tech circles.

I get more out of my taxes compared to equivalent corporate bills that it's laughable.

Government is comprised of people and the last 50 years has been the government mostly giving money and establishing programs to the small cohorts that have been hoarding all the wealth. Somehow this is never an issue with the government however.

Also never understand the arguments from these types either because if you think the government is bad then you should want it to be better. Better mostly meaning having more money to redistribute and more personal to run programs, but it's never about these things. It's always attacking the government to make it worse at the expense of the people.

192. megamix ◴[] No.43964774{3}[source]
Tell that to Snowden.
193. 93po ◴[] No.43964780{6}[source]
Copyright isn't about distribution, it's about creation. In reality the chances of getting in trouble is basically zero if you don't distribute it - who would know? But technically any creation, even in private, is violating copyright. Doesn't matter if you make money or put it on the internet.

There is fair use, but fair is an affirmative defense to infringing copyright. By claiming fair use you are simultaneously admitting infringement. The idea that you have to defend your own private expression of ideas based on other ideas is still wrong in my view.

replies(1): >>43965095 #
194. oooyay ◴[] No.43964790{4}[source]
Yes, I was being a bit facetious. It was snark intended to point out that corporations don't get to have their cake and eat it too. Either everything is free and there are no boundaries or we live by our own principles.
replies(3): >>43964944 #>>43964966 #>>43965117 #
195. zem ◴[] No.43964792{4}[source]
post office and USDA (pre trump regime slash-and-burn of course)
196. Zambyte ◴[] No.43964798{7}[source]
"Early access" doesn't work in this context, but yes for the other means.
197. Bjorkbat ◴[] No.43964799[source]
I broadly agree in that sure, unfettered access to copyrighted material will AI more capable, but more capable of what exactly?

For national security reasons I'm perfectly fine with giving LLMs unfettered access to various academic publications, scientific and technical information, that sort of thing. I'm a little more on the fence about proprietary code, but I have a hard time believing there isn't enough code out there already for LLMs to ingest.

Otherwise though, what is an LLM with unfettered access to copyrighted material better at vs one that merely has unfettered access to scientific / technical information + licensed copyrighted material? I would suppose that besides maybe being a more creative writer, the other LLM is far more capable of reproducing copyrighted works.

In effect, the other LLM is a more capable plagiarism machine compared to the other, and not necessarily more intelligent, and otherwise doesn't really add any more value. What do we have to gain from condoning it?

I think the argument I'm making is a little easier to see in the case of image and video models. The model that has unfettered access to copyrighted material is more capable, sure, but more capable of what? Capable of making images? Capable of reproducing Mario and Luigi in an infinite number of funny scenarios? What do we have to gain from that? What reason do we have for not banning such models outright? Not like we're really missing out on any critical security or economic advantages here.

replies(1): >>43965158 #
198. wvenable ◴[] No.43964804[source]
There's now an entire generation now that believes "Intellectual Property" is a real thing.

Instead of the understanding that copyrights and patents are temporary state-granted monopolies meant to benefit society they are instead framed as real perpetual property rights. This framing fuels support for draconian laws and obscures the real purpose of these laws: to promote innovation and knowledge sharing and not to create eternal corporate fiefdoms.

199. jasonlotito ◴[] No.43964814{5}[source]
> Absolute horse shit.

I agree, what followed was.

> I can start a 1-900 answer line and use any reference I want to answer your question

Yeah, that's not what we are talking about. If you think it was, you should probably do some more research on the topic.

200. Workaccount2 ◴[] No.43964817{5}[source]
>"But making commercial use of vast troves of copyrighted works to produce expressive content"

This can only be referring to training, the models themselves are a rounding error in size compared to their training sets.

201. dylan604 ◴[] No.43964818{4}[source]
It also assumes that the orange man has an original thought and not something that he's been convinced of by all of the direct underlings or even 3rd party NGOs that advise/lobby those underlings.
202. 93po ◴[] No.43964830{6}[source]
I will also add: there are tons of examples of companies taking down not for profit fanction or fan creation of stuff. Nintendo is very aggressive about this. The publisher of Harry Potter has also aggressively taken down not for profit fanfiction.

> If we allow AI companies to train LLMs on copyrighted works without paying for that access, we are choosing to reward these companies instead of the humans who created the data upon which these companies are utterly reliant for said LLMs.

It's interesting how much parallel there is here to the idea that company owners reap the rewards of their employee's labor when doing no additional work themselves. The fruits of labors should go to the individuals who labor, I 100% agree.

203. zelphirkalt ◴[] No.43964853{4}[source]
Not sure what you are getting at?
204. aeonik ◴[] No.43964865{4}[source]
"Filthy eidetics. Their freeloading had become too much for our society to bear. Something had to be done. We found the mutation in their hippocampus and released a new CRISPR-mRNA-based gene suppression system.

Those who were immune were put under the scalpel."

205. Workaccount2 ◴[] No.43964872{3}[source]
The vast trove of copyright work has to refer to training. ChatGPT is likely on the order of 5-10TB in size. (Yes, Terabyte).

There are college kids with bigger "copyright collections" than that...

replies(1): >>43965192 #
206. 93po ◴[] No.43964889{4}[source]
People see actions and make assumptions on intentions behind those actions. They also make assumptions on who actually called for those actions, or the percent to which people contributed to those decisions.

If you don't have a tape recorder showing Trump saying "Fire Shira, I don't like what she did and she needs to get out" then you are making assumptions both for his reasons and his involvement. No one has that tape. Which means any claims that this is what happening is entirely speculation. We've seen a decade of people claiming these assumptions as fact, and it's really tiresome.

replies(1): >>43965090 #
207. Buttons840 ◴[] No.43964900{4}[source]
Weather Forecasting
208. wnevets ◴[] No.43964906[source]
> Minnesota woman to pay $220,000 fine for 24 illegally downloaded songs [1]

https://www.theguardian.com/technology/2012/sep/11/minnesota... [1]

replies(1): >>43965002 #
209. jasonlotito ◴[] No.43964908{5}[source]
> My proposal is that it's a luddish kneejerk reaction to things people don't understand and don't like.

Your proposal is moving goal posts.

> Allow me to update your sentence: "Every paper every scientist or academic wrote that references any copyrighted work becomes an issue".

No, I never said that. Fair Use exists.

> Fan fiction is legal if you're not trying to profit from it.

No, it's not.[1] You can make arguments that it should be, but, no.

[1] https://jipel.law.nyu.edu/is-fanfiction-legal/

> I think you're missing my point

I think you got called out, and you are now trying to reframe your original comment so it comes across as having accounted for the things you were called out on.

You think you know what you are talking about, but you don't. But, you rely on the fact that you think you do to lose the money you do.

210. zelphirkalt ◴[] No.43964910{4}[source]
Well, you don't get to pick and choose in which situations an LLM is considered similar to a human being and in which not. If you argue that it similarly to a human is lossy, well let's go ahead and get most output checked by organizations and courts for violations of the law and licenses, just like human work is. Oh wait, I forgot, LLMs are run by companies with too much cash to successfully sue them. I guess we just have to live with it then, what a pity.
replies(2): >>43965269 #>>43967046 #
211. GuB-42 ◴[] No.43964921{4}[source]
That intellectual property is non exclusive doesn't change the inheritance problem.

If you consider it right to get value from the work of your family, and you consider that intellectual work (such as writing a book) to be valuable, then as an inheritor, you should get value from it. And since the way we give value to intellectual work is though copyright, then inheritors should inherit copyright.

If you think that copyright should not exceed lifetime, then the logical consequences would be one of:

- inheritance should be abolished

- intellectual work is less valuable than other forms of work

- intellectual property / copyright is not how intellectual work should be rewarded

There are arguments for abolishing inheritance, it is after all one of the greatest sources of inequality. Essentially, it means 100% inheritance tax in addition to all the work going into the public domain. Problematic in practice.

For the value of intellectual work, well, hard to argue against it on Hacker News without being a massive hypocrite.

And there are alternatives to copyright (i.e. artificial scarcity) for compensating intellectual work like there are alternatives to capitalism. Unfortunately, it often turns out poorly in practice. One suggestion is to have some kind of tax that is fairly distributed between authors in exchange for having their work in the public domain. Problem is: define "fairly".

Note that I am not saying that copyright should last long, you can make copyright 20 years, humans or corporate, inheritable. Simple, gets in the public domain sooner, fairer to older authors, already works for patents. Why insist on "lifetime"?

replies(1): >>43964991 #
212. Workaccount2 ◴[] No.43964933{5}[source]
The crux of the debate is a motte and bailey.

AI is capable of reproducing copyright (motte) therefore training on copyright is illegal (bailey).

replies(2): >>43968228 #>>43969157 #
213. r053bud ◴[] No.43964944{5}[source]
It’s barely facetious though. What is stopping me from “starting an AI company” (LLC, sure), torrenting all ebooks (which Facebook did), and as long as I don’t seed, I’m golden?
replies(1): >>43965133 #
214. snozolli ◴[] No.43964966{5}[source]
Either everything is free and there are no boundaries or we live by our own principles.

Or C) large corporations (and the wealthy) do whatever they want while you still get extortion letters because your kid torrented a movie.

They really do get to have their cake and eat it too, and I don't see any end to it.

215. SilasX ◴[] No.43964975{8}[source]
The problem is, you can say all of that for human learning-from-copyrighted-works, so that point isn't definitive.
replies(2): >>43967554 #>>43969144 #
216. dghlsakjg ◴[] No.43964991{5}[source]
Agreed. I think it should be the greater of 20 years or the lifetime of the original authors.
217. gruez ◴[] No.43965002[source]
How is this relevant?

>The RIAA accused her of downloading and distributing more than 1,700 music files on file-sharing site KaZaA

Emphasis mine. I think most people would agree that whatever AI companies are doing with training AI models is different than sending verbatim copies to random people on the internet.

replies(4): >>43965037 #>>43965136 #>>43965895 #>>43966470 #
218. standardUser ◴[] No.43965010{5}[source]
The US federal government doesn't run most museums, but it does run the massive parks system with 20k employees (pre-Musk) and that system enjoys extremely high ratings from guests.
219. rrook ◴[] No.43965025{4}[source]
That mathematical formulas already cannot be copyrighted makes this a kinda nonsense example?
220. wnevets ◴[] No.43965037{3}[source]
> I think most people would agree that whatever AI companies are doing with training AI models is different than sending verbatim copies to random people on the internet.

I think most artist who had their works "trained by AI" without compensation would disagree with you.

replies(3): >>43965160 #>>43965206 #>>43967996 #
221. gruez ◴[] No.43965072{3}[source]
>The fatal flaw in your reasoning: machines aren't humans. You can't reason that a machine has rights from the fact a human has them. Otherwise it's murder to recycle a car.

That might be true but I don't see how it's relevant. There's no provision in copyright law that gives a free pass to humans vs machines, or makes a distinction between them.

replies(1): >>43965379 #
222. flyingcircus3 ◴[] No.43965090{5}[source]
Now apply this reasoning to Trump standing in Air Force One and saying that he would bring someone back of the Supreme Court said to. It's on video.
replies(1): >>43966315 #
223. Zambyte ◴[] No.43965095{7}[source]
> Copyright isn't about distribution, it's about creation

This is exactly wrong. You can copy all of Harry Potter into your journal as many times as you want legally (creating copies) so long as you do not distribute it.

replies(1): >>43966393 #
224. empath75 ◴[] No.43965098{6}[source]
If a human were to reproduce, from memory, a copyrighted work, that would be illegal as well, and multiple people have been sued over it, even doing it unintentionally.

I'm not talking about learning. I'm talking about the complete reproduction of a copyrighted work. It doesn't matter how it happens.

replies(1): >>43965454 #
225. gruez ◴[] No.43965117{5}[source]
>It was snark intended to point out that corporations don't get to have their cake and eat it too.

"have their cake and eat it too" allegations only work if you're talking about the same entity. The copyright maximalist corporations (ie. publishers) aren't the same as the permissive ones (ie. AI companies). Making such characterizations make as much sense as saying "citizens don't get to eat their cake and eat it too", when referring to the fact that citizens are anti-AI, but freely pirate movies.

replies(1): >>43965143 #
226. gruez ◴[] No.43965133{6}[source]
>What is stopping me from “starting an AI company” (LLC, sure), torrenting all ebooks (which Facebook did), and as long as I don’t seed, I’m golden?

Nothing. You don't even need the LLC. I don't think anyone got prosecuted for only downloading. All prosecutions were for distribution. Note that if you're torrenting, even if you stop the moment it's finished (and thus never goes to "seeding"), you're still uploading, and would count as distribution for the purposes of copyright law.

replies(1): >>43966059 #
227. jofla_net ◴[] No.43965136{3}[source]
Who knew alls she needed was to change the tempo, pitch, timbre, add/remove lyrics, add/subtract a few notes, rearrange harmony, put it behind a web portal with a fancy name, claim it had an inspirational muse or assume all mortal beings as being without one in the first place so it doesn't matter, and proceed to make millions off of said process methodically rather than giving it away for free, and she'd be right as rain.
replies(1): >>43965236 #
228. _aavaa_ ◴[] No.43965143{6}[source]
Yes they are. Look at what happened when deepseek came out. Altman started crying and alleging that deepseek was trained on OpenAI model outputs without an inkling of irony
replies(2): >>43965232 #>>43968036 #
229. gruez ◴[] No.43965149{3}[source]
>Or order the Internet cables to be cut as a last resort (citing a national emergency as is the new practice).

what if they route through third countries?

230. Teever ◴[] No.43965158{3}[source]
If common culture is an effective substrate to communicate ideas as in we can use shared pop culture references to make metaphors to explain complex ideas then the common culture that large companies have ensnared in excessively long copyrights and trademarks to generate massive profits is a useful thing for an LLM that is designed to convey ideas to have embedded in it.

If I'm learning about kinematics maybe it would be more effective to have comparisons to Superman flying faster than a speeding bullet and no amount of dry textbooks and academic papers will make up for the lack of such a comparison.

This is especially relevant when we're talking about science-fiction which has served as the inspiration for many of the leading edge technologies that we use including stuff like LLMs and AI.

replies(1): >>43966608 #
231. EMIRELADERO ◴[] No.43965160{4}[source]
The question is: would that disagreement have the same basis as the news above? I don't think so. Artists that are against GenAI take that stance out of a perceived abstract unfairness of the situation, where the AI companies aren't copy-pasting the works per-se with each generation, but rather "taking" the "sweat of the brow" of those artists. You can agree or not about this being an actual problem, but that's where the main claim is.
replies(1): >>43965277 #
232. gitremote ◴[] No.43965192{4}[source]
No. The paragraph as a whole refers to the "outputs" of vast troves of copyrighted work.

Disk size is irrelevant. If you lossy-compress a copyrighted bitmap image to small JPEG image and then sell the JPEG image, it's still copyright infringement.

replies(1): >>43969201 #
233. bongodongobob ◴[] No.43965196{4}[source]
National Weather Service

Library of Congress

National Park Service

U.S. Geological Survey (USGS)

NASA

Smithsonian Institution

Centers for Disease Control and Prevention (CDC)

Social Security Administration (SSA)

Federal Aviation Administration (FAA) air traffic control

U.S. Postal Service (USPS)

234. gruez ◴[] No.43965206{4}[source]
Studio ghibli[1] might object to both people pirating their films and AI companies allowing their art style to be duplicated, but that's not the same as saying those two things are the same. Sharing a movie rip on bittorrent is obviously different than training an AI model that can reproduce the studio ghbili style, even to diehard AI opponents.

[1] used purely as an example

replies(1): >>43965997 #
235. rollcat ◴[] No.43965219{3}[source]
Well I always felt rebellious about the contemporary face of "rules for thee but not for me", specifically regarding copyright.

Musicians remain subject to abuse by the recording industry; they're making pennies on each dollar you spend on buying CDs^W^W streaming services. I used to say, don't buy that; go to a concert, buy beer, buy merch, support directly. Nowadays live shows are being swallowed whole through exclusivity deals (both for artists and venues). I used to say, support your favourite artist on Bandcamp, Patreon, etc. But most of these new middlemen are ready for their turn to squeeze.

And now on top of all that, these artists' work is being swallowed whole by yet another machine, disregarding what was left of their rights.

What else do you do? Go busking?

replies(1): >>43968990 #
236. halkony ◴[] No.43965230{5}[source]
> I do not think it is meaningfully different from the simpler example, just with a lot of extra steps.

Those extra steps are meaningfully different. In your description, a casual observer could compare the two JPEGs and recognize the inferior copy. However, AI has become so advanced that such detection is becoming impossible. It is clearly voodoo.

237. gruez ◴[] No.43965232{7}[source]
>Altman started crying and alleging that deepseek was trained on OpenAI model outputs without an inkling of irony

Can you link to the exact comments he made? My impression was that he was upset at the fact that they broke T&C of openai, and deepseek's claim of being much cheaper to train than openai didn't factor in the fact that it requried openai's model to bootstrap the training process. Neither of them directly contradict the claim that training is copyright infringement.

238. glimshe ◴[] No.43965236{4}[source]
You just described pop music making. Change tempo, pitch, add/remove lyrics, etc from prior art.
239. gruez ◴[] No.43965264{7}[source]
>I didn't make this claim

???

Did you not literally comment the following?

>A new research paper is obviously materially different from "rearranging that text to create a marginally new text".

What did you mean by that, if that's not your claim?

replies(1): >>43965763 #
240. philipkglass ◴[] No.43965269{5}[source]
There are a couple of ways to theoretically prevent copyright violations in output. For closed models that aren't distributed as weights, companies could index perceptual hashes of all the training data at a granular level (like individual paragraphs of text) and check/retry output so that no duplicates or near-duplicates of copyrighted training data ever get served as a response to end users.

Another way would be to train an internal model directly on published works, use that model to generate a corpus of sanitary rewritten/reformatted data about the works still under copyright, then use the sanitized corpus to train a final model. For example, the sanitized corpus might describe the Harry Potter books in minute detail but not contain a single sentence taken from the originals. Models trained that way wouldn't be able to reproduce excerpts from Harry Potter books even if the models were distributed as open weights.

241. wnevets ◴[] No.43965277{5}[source]
> would that disagreement have the same basis as the news above?

Yes. An artist's style can and sometimes is their IP.

replies(1): >>43965462 #
242. zelphirkalt ◴[] No.43965314{6}[source]
Just to challenge that idea: Why?
replies(1): >>43965700 #
243. moralestapia ◴[] No.43965360[source]
Because it's a machine that reproduces other people's work, who are copyrighted. Copyright protects the essence of original work even after its present in or turned into derivative work.

Some try to make the argument of "but that's what humans do and it's allowed", but that's not a real argument as it has not been proven, nor it is easy to prove, that machine learning equates human reasoning. In the absence of evidence, the law assumes NO.

244. moralestapia ◴[] No.43965379{4}[source]
In the case of Copyright law, no provision means it will fall in "forbidden" land, not in "allowed" land.

Also in general, grey areas don't mean those things are legal.

Edit: this remains true even if you don't like it, ¯\_(ツ)_/¯.

replies(1): >>43965497 #
245. moralestapia ◴[] No.43965405{4}[source]
>Copyright only comes into play on publication.

Nope.

You have a right to not publish any work that you own. This is protected by Copyright law.

Copyright covers you from the moment you create some sort of original work (in a tangible medium).

246. stevenAthompson ◴[] No.43965409{4}[source]
> I fear the lack of our ability to measure your mind might render you without many of the legal or moral protections you imagine you have.

Society doesn't need to measure my mind, they need to measure the output of it. If I behave like a conscious being, I am a conscious being. Alternatively you might phrase it such that "Anything that claims to be conscious must be assumed to be conscious."

It's the only answer to the p-zombie problem that makes sense. None of this is new, philosophers have been debating it for ages. See: https://en.wikipedia.org/wiki/Philosophical_zombie

However, for copyright purposes we can make it even simpler. If the work is new, it's not covered by the original copyright. If it is substantially the same, it isn't. Forget the arguments about the ghost in the machine and the philosophical mumbo-jumbo. It's the output that matters.

replies(1): >>43965699 #
247. bongodongobob ◴[] No.43965414[source]
Right around the same time struggling artists thought paying $40 for global distribution via Spotify and not getting paid anything for their 100 streams a month was being "ripped off". And I think that is related to influencer culture. Everyone thinks they deserve to be famous and needs someone to blame for their below average art not making them rich.
248. gruez ◴[] No.43965454{7}[source]
>I'm not talking about learning. I'm talking about the complete reproduction of a copyrighted work. It doesn't matter how it happens.

In that case I don't think there's anything controversial here? Nobody thinks that if you ask AI to reproduce something verbatim, that you should get a pass because it's AI. All the controversy in this thread seems to be around the training process and whether that breaks copyright laws.

replies(2): >>43966775 #>>43969193 #
249. mncharity ◴[] No.43965458[source]
I considered adding a reminder above that email used to be a copyright violation. Implied license not yet established; every copy between disk and memory a violation; let alone forwarding; the occasional email footer "LegalisticCo grants you a licence to use this email under the following terms ...". Oh well. And then almost all sharing of images.
250. EMIRELADERO ◴[] No.43965462{6}[source]
No it's not? Style has been ruled pretty specifically to be uncopyrightable. Perhaps you could show me some examples?
replies(1): >>43965571 #
251. gruez ◴[] No.43965497{5}[source]
>In the case of Copyright law, no provision means it will fall in "forbidden" land, not in "allowed" land.

AI companies claim it falls under fair use. Pirates use the same excuse too. Just look at all the clips uploaded to youtube with a "it's fair use guys!" note in the description. The only difference between the two is that the former is novel enough that there's plausible arguments for both sides, and the latter has been so thoroughly litigated that you'd be laughed out of the courtroom for claiming that your torrenting falls under fai ruse.

replies(1): >>43965695 #
252. TeMPOraL ◴[] No.43965500{7}[source]
> Under current law OpenAI, Google, etc. have no right to cheap training data, because someone made that data and may have the reasonable interest in getting paid for their efforts.

That's the thing though: intuitively, they do - training the model != generating from the model, and it's the output of a generation that violates copyright (and the user-supplied prompt is a crucial ingredient in getting the potentially copyrighted material to appear). And legally, that's AFAIK still an open question.

> Like all businesses, those companies would ideally like the law to be unfairly biased towards them: to protect them when they charge as much as they can, but not protect anyone else so they can pay as little as possible.

That's 100% true. I know that, I'm not denying that. But in this particular case, I find my own views align with their case. I'm not begrudging them for raking in heaps of money offering generative AI services, because they're legitimately offering value that's at least commensurate (IMHO it's much greater) to what they charge, and that value comes entirely from the work they're uniquely able to do, and any individual work that went into training data contributes approximately zero to it.

(GenAI doesn't rely on any individual work in training data; it relies on the breadth and amount being a notable fraction of humanity's total intellectual output. It so happens that almost all knowledge and culture is subject to copyright, so you couldn't really get to this without stepping on some legal landmines.)

(Also, much like AI companies would like the law to favor them, their opponents in this case would like the law to dictate they should be compensated for their works being used in training data, but compensated way beyond any value their works bring in, which in reality is, again, approximately zero.)

replies(1): >>43966813 #
253. ben_w ◴[] No.43965515{7}[source]
> Except in this case, we already have the equivalent of "laws about oxygen consumption": copyright.

Copyright laws were themselves created by the printing press making it easy to duplicate works, whereas previously if you half-remembered something that was just "inspiration".

But that only gave the impression of helping creative people: today, any new creative person has to compete with the entire reproducible cannon of all of humanity before them — can you write fantasy so well that new readers pick you up over Pratchett or Tolkien?

Now we have AI which are "inspired" (perhaps) by what they read, and half-remember it, in a way that seems similar to pre-printing-press humans sharing stories even if the mechanism is different.

How this is seen according to current law likely varies by jurisdiction; but the law as it is today matters less than what the law will be when the new ones are drafted to account for GenAI.

What that will look like, I am unsure. Could be that for training purposes, copyright becomes eternal… but it's also possible that copyright may cease to exist entirely — laws to protect the entire creative industry may seem good, but if AI displaces all humans from economic activity, will it continue to matter?

replies(2): >>43965733 #>>43966984 #
254. SketchySeaBeast ◴[] No.43965549{9}[source]
Removing copyright is removing a lot of the protections that enable users to get paid for their efforts. How would a novelist make money, and why would someone pay them, if their work is free to be copied at will?
replies(1): >>43966039 #
255. wnevets ◴[] No.43965571{7}[source]
Waits v. Frito-Lay. The court held that his voice and style were part of his brand and thus protected.

https://www.youtube.com/watch?v=k0H_hcRc0MA

replies(1): >>43965659 #
256. Ekaros ◴[] No.43965573{4}[source]
Build a big firewall. And then fine massively any ISP that allows traffic to reach bad hosts...
257. Ekaros ◴[] No.43965586{3}[source]
Your employee steals your source code and sells it to multiple competitors. Why should you have any right to go after those competitors?
replies(1): >>43969019 #
258. EMIRELADERO ◴[] No.43965659{8}[source]
That has nothing to do with IP, it's a personality rights claim. The decision explicitly refuses to involve Copyright, saying that voices (and by proxy the styles) are not copyrightable. What mattered there were specific rights of publicity, not IP.
replies(1): >>43965871 #
259. stevenAthompson ◴[] No.43965672{4}[source]
> If the user publishes the output without attribution, NOW you have a problem.

I didn't meant to imply that the AI can't quote Shakespeare in Context, just that it shouldn't try to pass off Shakespeare as it's own or plagiarize huge swathes of the source text.

> People are being so rabid and unreasonable here.

People here are more reasonable than average. Wait until mainstream society starts to really feel the impact of all this.

260. ◴[] No.43965685{6}[source]
261. stevenAthompson ◴[] No.43965692{4}[source]
> you don't need permission, you just need to follow the procedures

Those procedures are how you ask for permission. As you say, it usually involves a fee but doesn't have to.

replies(1): >>43966650 #
262. moralestapia ◴[] No.43965695{6}[source]
Agree. It feels a bit like earlier days in Bitcoin world. Eventually the courts decided how it was going to be and people like CZ had to pay a visit to jail, but there is now clear jurisdiction on that.

The same will happen with AI, no one will go to jail but perhaps it is ruled out that LLMs infringe copyright.

(Same thing happened in the early days of YouTube as well, the solution was stuff like MusicDNA, etc...)

263. mjburgess ◴[] No.43965699{5}[source]
In your case, it isnt the output that matters. Your saying "I'm conscious" isn't why we attribute consciousness to you. We would do so regardless of your ability to verbalise anything in particular.

Your radical behaviourism seems an advantage to you when you want to delete one disfavoured part of copyright law, but I assure you, it isn't in your interest. It doesnt universalise well at all. You do not want to be defined by how you happen to verbalise anything, unmoored from your intention, goals, and so on.

The law, and society, imparts much to you that is never measured and much that is unmeasurable. What can be measured is, at least, extremely ambiguous with respect to those mental states which are being attributed. Because we do not attribute mental states by what people say -- this plays very little role (consider what a mess this would make of watching movies). And none of course in the large number of animals which share relevant mental states.

Nothing of relevance is measured by an LLM's output. It is highly unambigious: the LLM has no mental states, and thus is irrelevant to the law, morality, society and everything else.

It's a obcene sort of self-injury to assume that whatever kind of radical behaviourism is necessary to hype the LLM is the right sort. Hype for LLMs does not lead to a credible theory of minds.

replies(1): >>43966504 #
264. dmonitor ◴[] No.43965700{7}[source]
People would use that service instead of Steam, publishers would add annoying DRM to mitigate lost sales, etc etc.

The current illegality of the piracy website prevents them from offering a service as nice as Steam. It has to be a sketchy torrent hub that changes URLs every few months. If it was as easy as changing the url to freesteampowered.com or installing an extension inside the steam launcher, the whole "piracy is a service issue" argument loses all relevance. The industry would become unsustainable without DRM (which would be technically legal to crack, but also more incentivized to make harder to crack).

replies(1): >>43965946 #
265. pc86 ◴[] No.43965711{5}[source]
We're talking about the US government though
replies(1): >>43967503 #
266. dmonitor ◴[] No.43965729{7}[source]
There's no reason FreeSteam can't also do that, though. There's no copyright, so just have an extension of the steamapp that changes it to point to your server when downloading games / checking ownership. Piracy stops being a service issue when pirates are allowed to make nice services.
267. Jensson ◴[] No.43965733{8}[source]
> But that only gave the impression of helping creative people: today, any new creative person has to compete with the entire reproducible cannon of all of humanity before them — can you write fantasy so well that new readers pick you up over Pratchett or Tolkien?

That is even worse without copyright, as then every previous work would be free and you would have to compete with better works that are also free for people.

replies(1): >>43967536 #
268. dfxm12 ◴[] No.43965763{8}[source]
I made that comment, but the bit in quotes is not my claim. I was quoting a grandparent post. If you read from the top, the quotation marks and general flow of the thread should make this clear.
269. dns_snek ◴[] No.43965792[source]
The problem with this kind of analysis is that it doesn't even try to address the reasons why copyright exists in the first place. This belief that training LLMs on content without permission should be allowed is incompatible with the belief that copyright is useful, you really have to pick a lane here.

Go back to the roots of copyright and the answers should be obvious. According to the US constitution, copyright exists "To promote the Progress of Science and useful Arts" and according to the EU, "Copyright ensures that authors, composers, artists, film makers and other creators receive recognition, payment and protection for their works. It rewards creativity and stimulates investment in the creative sector."

If I publish a book and tech companies are allowed to copy it, use it for "training", and later regurgitate the knowledge contained within to their customers then those people have no reason to buy my book. It is a market substitute even though it might not be considered such under our current copyright law. If that is allowed to happen then investment will stop and these books simply won't get written anymore.

270. wnevets ◴[] No.43965871{9}[source]
> That has nothing to do with IP, it's a personality rights claim.

The US Supreme Court disagrees, the right of publicity and intellectual property law are explicitly linked.

> The broadcast of a performer’s entire act may undercut the economic value of that performance in a manner analogous to the infringement of a copyright or patent. — Justice White

replies(1): >>43966207 #
271. hulitu ◴[] No.43965877[source]
> One aspect that I feel is ignored by the comments here is the geo-political forces at work. If the US takes the position that LLMs can't use copyrighted work or has to compensate all copyright holders – other countries (e.g. China) will not follow suit.

Oh really ? They didn't had any problem when people installed copyrighted Windows to come after them. BSA. But now Microsoft turns a blind eye because it suits them.

272. hulitu ◴[] No.43965895{3}[source]
> How is this relevant?

She was training RI (real intelligence). Is now relevant ? Or does she has to be rich and pay some senators to be relevant ?

273. hochstenbach ◴[] No.43965920[source]
Humans are not allowed to do what AI firms want to do. That was one of the copyright office arguments: a student can't just walk into a library and say "I want a copy of all your books, because I need them for learning".

Humans are also very useful and transformative.

274. Zambyte ◴[] No.43965946{8}[source]
> publishers would add annoying DRM to mitigate lost sales, etc etc.

People would just delete the malware (DRM) out of the source code that is no longer restricted by copyright.

If your argument is that copyright is good because it discourages DRM, I think you have a very evidently weak argument.

replies(1): >>43975280 #
275. MyOutfitIsVague ◴[] No.43965951{6}[source]
There were the famous napster cases, the kids and old ladies that got sued by the RIAA for using limewire to download some music.

There is also the fact that copyright holders will pressure your ISP into sending threatening letters and shutting off your Internet for piracy, even without you seeding. I haven't gotten the impression that you are in the clear for pirating as long as you don't distribute.

276. hulitu ◴[] No.43965997{5}[source]
> Sharing a movie rip on bittorrent is obviously different than training an AI model that can reproduce the studio ghbili style, even to diehard AI opponents.

Ok, how about training AI on leaked Windows source code ?

replies(1): >>43966083 #
277. hulitu ◴[] No.43966032{6}[source]
> The fact you are even using the word stealing, is telling to your lack of knowledge in this field.

I agree. If you can pay the judge, the congress or the president, it is definitely not stealing. It is (the best) democracy (money can buy). /s

replies(1): >>43966238 #
278. Zambyte ◴[] No.43966039{10}[source]
> How would a novelist make money

Maybe selling books? Maybe other jobs? The same way that they made money for thousands of years before copyright, really. Books and other arts did exist before copyright!

> and why would someone pay them, if their work is free to be copied at will?

I don't think it's really a matter of if people will pay them. If their art is good, of course people will pay them. People feel good about paying for an original piece of art.

The question is really more about if people will be able to get obscenely rich over being the original creator of some piece of art, to which the answer is it would indeed be less likely.

replies(1): >>43966359 #
279. Pooge ◴[] No.43966059{7}[source]
Which is still what Facebook did, if I'm not mistaken. There's no way they torrented and managed to upload less than 1 bit.
replies(1): >>43966700 #
280. gruez ◴[] No.43966083{6}[source]
arguably different from both, because you microsoft could say it's a trade secret. Note I'm not claiming that because it's different, it must be okay, just that it's unfair to compare torrenting with AI training.
281. EMIRELADERO ◴[] No.43966207{10}[source]
That's just an analog in an opinion, it's not binding. Also, that's just a new IP term then, but we were talking about copyright, not any abstract form of IP.

Again, show me an example where an artist's style was used for copyright infringement in court. Can you produce even one example?

replies(1): >>43966528 #
282. nadermx ◴[] No.43966238{7}[source]
So when someone steals something from you, you no longer have it. Yet here they paid the judge(s) because the person who's been "robbed" still has their thing?
283. 93po ◴[] No.43966315{6}[source]
I spent 10+ minutes trying to find anything Trump has said on camera about the copyright office, and went through the only video I could find of Trump on air force one in the past week to see any references to this, and saw none.
replies(1): >>43966830 #
284. SketchySeaBeast ◴[] No.43966359{11}[source]
> The same way that they made money for thousands of years before copyright, really.

We didn't have modern novelists a thousand years ago. We didn't have mass production until ~500 years ago, and copyright came in in the 1700's. We didn't have mass produced pulp fiction like we do today until the 20th century. There is little copyright-less historical precedent to refer to here, even if we carve out the few hundred years between the printing press and copyright, it's not as though everyone was mass consuming novels, the literacy rate was abysmal. I wonder what artist yearns for the 1650s.

> If their art is good, of course people will pay them.

You say this as if it were a fact, but that's not axiomatic. Once the first copy is in the wild it's fair game for anyone to copy it as they will. Who is paying them? Should the artists return to the days of needing a wealthy patron? Is patreon the answer to all of our problems?

> Maybe selling books?

But how? To who? A publishing house isn't going to pick them up, knowing that any other publishing house can start selling the same book the minute it shows to be popular, and if you're self publishing and you're starting to make good numbers then the publishing houses can eat you alive.

> The question is really more about if people will be able to get obscenely rich over being the original creator of some piece of art, to which the answer is it would indeed be less likely.

No, the question is if ordinary people could make a living off their novels without copyright. It's very hard today, but not impossible. Without copyright it wouldn't be.

285. lavezzi ◴[] No.43966372{6}[source]
There's tonnes, this is a baffling question.
286. whamlastxmas ◴[] No.43966393{8}[source]
https://en.wikipedia.org/wiki/Copyright_law_of_the_United_St...

"copyright law assigns a set of exclusive rights to authors: to make and sell copies of their works, to create derivative works, and to perform or display their works publicly"

"The owner of a copyright has the exclusive right to do and authorize others to do the following: To reproduce the work in copies or phonorecords;To prepare derivative works based upon the work;"

"Commonly, this involves someone creating or distributing"

https://www.copyright.gov/what-is-copyright/

"U.S. copyright law provides copyright owners with the following exclusive rights: Reproduce the work in copies or phonorecords. Prepare derivative works based upon the work."

https://internationaloffice.berkeley.edu/students/intellectu...

"Copyright infringement occurs when a work is reproduced, distributed, displayed, performed or altered without the creator’s permission."

There are endless legitimate sources for this. Copyright protects many things, not just distribution. It very clearly disallows the creation and production of copyrighted works.

287. breakingcups ◴[] No.43966470{3}[source]
Well, Facebook torrented the copyrighted material they used for training, which means they distributed all those files too. With the personal approval of Zuck. What is the difference according to you?

Source: https://futurism.com/the-byte/facebook-trained-ai-pirated-bo...

replies(1): >>43967115 #
288. lavezzi ◴[] No.43966472[source]
Very recently, because historically the majority of people engaging in it aren't looking to profit from piracy.

The general public has been lectured for decades about how piracy is morally wrong, but as soon as startups and corporations are in it for profit, everybody looks away?

289. stevenAthompson ◴[] No.43966504{6}[source]
> We would do so regardless of your ability to verbalise anything in particular

I don't mean to say that they literally have to speak the words by using their meat to make the air vibrate. Just that, presuming it has some physical means, it be capable (and willing) to express it in some way.

> It's a obcene sort of self-injury to assume that whatever kind of radical behaviourism is necessary to hype the LLM is the right sort.

I appreciate why you might feel that way. However, I feel it's far worse to pretend we have some undetectable magic within us that allows us to perceive the "realness" of others peoples consciousness by other than physical means.

Fundamentally, you seem to be arguing that something with outputs identical to a human is not human (or even human like), and should not be viewed within the same framework. Do you see how dangerous an idea that is? It is only a short hop from "Humans are different than robots, because of subjective magic" to "Humans are different than <insert race you don't like>, because of subjective magic."

290. SketchySeaBeast ◴[] No.43966511{7}[source]
The end product being inexpensive is a good thing - it means that the producer can sell it well below the cost it took to produce it, otherwise a novel would cost whatever it takes for Stephen King to live for 3 months.

I feel like you're shoving all information under the same label. The most profitable corporations are trading in information that isn't subject to copyright, and it's facts - how you drive, what you eat, where you live. It's newly generated ideas. Maybe it is in how the data is sorted, but they aren't copyrighting that either.

If we're going to overthrow artificial entrenchments of capitalism, I feel like there's better places to start than a lot of copyright. Does it need changes? Absolutely, there's certainly exploitation, but I still don't see "get rid of copyright entirely" as being a good approach. Weirdly, it's one of the places that people are arguing for that. Sometimes the criminal justice system convicts the wrong person, and there should be reform. It's also often criticized as a measure of control for capitalistic oligarchs. Should step one be getting rid of the legal system entirely?

291. kmeisthax ◴[] No.43966514{4}[source]
Yes, but none of that has anything to do with AI. Or democratization.

The fact that copyright law is easy to violate and hard to enforce doesn't stop Nintendo from burning millions of dollars on legal fees to engage in life-ruining enforcement actions against randos making fangames.

"Democratization" with respect to copyright law would be changing the law to put Mario in the public domain, either by:

- Reducing term lengths to make Mario literally public domain. It's unclear whether or not such an act would survive the Takings Clause of the US Constitution. Perhaps you could get around that by just saying you can't enforce copyrights older than 20 years even though they nominally exist. Which brings us to...

- Adding legal exceptions to copyright to protect fans making fan games. Unlikely, since in the US we have common law, which means our exceptions have to be legislated from the judicial bench, and judges are extremely leery of 'fair use' arguments that basically say 'it is very inconvenient for me to get permission to use the thing'.

- Creating some kind of social copyright system that "just handles" royalty payments. This is probably the most literal interpretation of 'democratize'. I know of few extant systems for this, though - like, technically ASCAP is this, but NOBODY would ever hold up ASCAP as an example of how to do licensing right. Furthermore without legal backing, Nintendo can just hold out and retain traditional "my way or the highway" licensing rights.

- Outright abolishing copyright and telling artists to fend for themselves. This is the kind of solution that would herald either a total system collapse or extreme authoritarianism. It's like the local furniture guy selling sofas at 99% off because the Mafia is liquidating his gambling debts. Sure, I like free shit, but I also know that furniture guy is getting a pair of cement shoes tonight.

None of these are what AI companies talk about. Adding an exception just for AI training isn't democratizing IP, because you can't democratize AI training. AI is hideously memory-hungry and the accelerators you need to make it work are also expensive. I'm not even factoring in the power budget. They want to replace IP with something worse. The world they want is one where there are three to five foundation models, all owned and controlled by huge tech megacorps, and anyone who doesn't agree with them gets cut off.

292. wnevets ◴[] No.43966528{11}[source]
All squares are rectangles, but not all rectangles are squares.

All right of publicity laws are intellectual property laws but not all intellectual property laws are right of publicity laws.

All copyright laws are intellectual property laws but not all intellectual property laws are copyright laws.

Right of publicity laws are intellectual property laws because the right of publicity is intellectual property. I don't know how else to articulate this over the internet, maybe its time to consult an AI?

replies(1): >>43966680 #
293. Bjorkbat ◴[] No.43966608{4}[source]
Fair point, we use metaphor to explain and understand a variety of topics, and a lot of those metaphors are best understood through pop culture analogies.

A reasonable compromise then is that you can train an AI on Wikipedia, more-or-less. An AI trained this way will have a robust understanding of Superman, enough that it can communicate through metaphor, but it won't have the training data necessary to create a ton of infringing content about Superman (well, it won't be able to create good infringing content anyway. It'll probably have access to a lot of plot summaries but nothing that would help it make a particularly interesting Superman comic or video).

To me it seems like encyclopedias use copyrighted pop culture in a way that constitutes fair use, and so training on them seems fine as long as they consent to it.

294. toast0 ◴[] No.43966650{5}[source]
(in the US) Mechanical licenses are compulsory; you don't need permission, you can just follow the forms and pay the fees set by the Copyright Royalty Board (appointed by the Librarian of Congress). You can ask the rightsholder to negotiate a lower fee, but there's no need for consent of the rightsholder if you notify as required (within 30 days of recording and before distribution) and pay the set fees.
replies(1): >>43967107 #
295. EMIRELADERO ◴[] No.43966680{12}[source]
My point is that the kind of IP at issue in this post and discussion is copyright, not personality rights. If we're talking about the views of the copyright office and how that relates to artists, it's implicit that we're staying in copyright land, because there has never been a case about style-as-IP in visual art.
replies(1): >>43968192 #
296. SketchySeaBeast ◴[] No.43966687{7}[source]
Is it though? All I see is hand-waving.
297. FireBeyond ◴[] No.43966700{8}[source]
You're right. They claimed they made efforts to minimize seeding, but minimal is not none, as you say.
replies(1): >>43966888 #
298. archagon ◴[] No.43966712[source]
It's not that complicated: little guy taking stuff from big corp (then) vs. big corp taking stuff from little guy (now). Similar to the recent debates over permissive open source licenses and corporate exploitation.

As for the zeitgeist, I'm not sure anything has materially changed. Recently, creators have been very upset over Silicon Valley AI companies ingesting their output. Is this really reflective of "general internet sentiment"? Would those same people have supported abolition of copyright in the past? I doubt it.

299. FireBeyond ◴[] No.43966745{4}[source]
To the point that Billy Joel "famously" credited the songwriter for one of his songs ("This Night") as "Billy Joel, Ludwig van Beethoven".
300. anigbrowl ◴[] No.43966752[source]
Oh boy, right again https://news.ycombinator.com/item?id=43940763
301. stonogo ◴[] No.43966756[source]
Big "Mr. President, we cannot allow a mineshaft gap" energy going on, even if it's difficult for me personally to believe that LLMs contribute in any sense to ruling the world.
302. anigbrowl ◴[] No.43966769{3}[source]
You are if it's parody, cf 'Bored of the Rings'.
303. empath75 ◴[] No.43966775{8}[source]
No -- the controversy is also over whether distributing the weights and software is a copyright violation. I believe that is. The copyrighted material is present in the software in some form, even if the process for regenerating it is quite convoluted.
replies(1): >>43967161 #
304. anigbrowl ◴[] No.43966801{4}[source]
You were supposed to keep reading past the first sentence, instead of trying to refute the first thing you saw that you found disagreeable. By doing so, you missed the point that plagiarism is substantively different from copyright infringement.
305. palmotea ◴[] No.43966813{8}[source]
> That's the thing though: intuitively, they do - training the model != generating from the model, and it's the output of a generation that violates copyright (and the user-supplied prompt is a crucial ingredient in getting the potentially copyrighted material to appear). And legally, that's AFAIK still an open question.

It's still copyright infringement if I download a pirated movie and never watch it (writing the bytes to the disk == "training" the disk's "model", reading the bytes back == "generating" from the disk's "model").

> That's 100% true. I know that, I'm not denying that. But in this particular case, I find my own views align with their case.

IMHO, unless you're massively wealthy and/or running a bigcorp, people like you benefit a lot more from copyright than are harmed by it. In a world without copyright protection, some bigcorp will be able to use its size to extract the value from the works that are out there (i.e. Amazon and Netflix will stop paying royalties instantly, but they'll still have customers because they have the scale to distribute). Copyright just means the little guy who's actually creating has some claim to get some of the value directed back to them.

> and any individual work that went into training data contributes approximately zero to it.

Then cut all those works out of the training set. I don't think it's an excuse that the infringement has to happen on a massive scale to be of value to the generative AI company.

306. anigbrowl ◴[] No.43966816[source]
Gee, perhaps we should not have done this in the first place. 'Foreigners might copy the irresponsible thing we did so we have do more of it' is not the most brilliant argument.
307. scraptor ◴[] No.43966819{5}[source]
I certainly don't see much value in AI generated papers myself, I just object to the claim that the mere act of reading a large number of existing papers before writing yours is inherently plagiarism.
308. anigbrowl ◴[] No.43966821[source]
It didn't, you're falsely conflating two quite different things to give cover to a different set of large corporations.
309. flyingcircus3 ◴[] No.43966830{7}[source]
https://www.washingtonpost.com/video/politics/trump-if-supre...

It's not related to copyright. It is an example of your hypothetical standard required to attribute something to Trump. My point is that even when he is on camera saying something, that does not prevent the post facto rationalizations. Even if he was on tape firing this person, people would rationalize this away too.

310. anigbrowl ◴[] No.43966881[source]
I invite you to imagine the howling that will ensue the moment some politician offers legislation requiring commercial LLM operators to publish their weights and training data.
311. gruez ◴[] No.43966888{9}[source]
You can make a patched torrent client that never uploads any pieces to peers. It'd definitely be within Meta's capability to do so. The real problem is that unlike typical torrenting lawusits, they weren't caught red-handed in the act, and would therefore be hard to go after them. This might seem unfair, but it's not any different than you openly posting on Reddit that you torrent, but it'd be tough for rights holders to go after you even with such admission.
replies(1): >>43967576 #
312. otterley ◴[] No.43966978{7}[source]
> [Moana] only exists in our heads.

Moana and Moana 2 are both animated movies that have already been made. They're not just figures of one's imagination.

> If I made a Moana comic book, with an entirely original storyline and original art and it was all drawn in my own style and not using 3D assets similar to their movies, that is violating copyright

It might be, or it might not. Copyright protects the creation of derivative works (17 USC 101, 17 USC 103, 17 USC 106), but it's the copyright holder's burden to persuade the court that the allegedly infringing work with the character Moana in it is derivative of their protected work.

Ask yourself the question: what is the value of Moana to you in this hypothetical? What if you used a different name for the character and the character had a different backstory and personality?

> I still don't agree with the idea that I can't make my own physical copies of Harry Potters books

You might think differently if you had sunk thousands of hours into creating a new novel and creative work was your primary form of income.

> But still, it's infringing copyright for me to make Moana comic books in my own home, in private, and never showing them to anyone.

It seems unlikely that Disney is would go after you for that. Kids do it all the time.

313. palmotea ◴[] No.43966984{8}[source]
> Copyright laws were themselves created by the printing press making it easy to duplicate works, whereas previously if you half-remembered something that was just "inspiration".

Eh. I don't know the history, but my understanding was they were created because the printing press allowed others to deny the original creators the profits to their work, and direct those profits to others who had no hand in it.

After all, in market terms: a publisher that pays its authors can't compete with another that publisher that publishes the same works but without paying any authors. A word without copyright is one where some publisher still makes money, but it's a race to the bottom for authors.

> But that only gave the impression of helping creative people: today, any new creative person has to compete with the entire reproducible cannon of all of humanity before them — can you write fantasy so well that new readers pick you up over Pratchett or Tolkien?

Here's a hole in your thinking: if you like fantasy, would you be content to just re-read Tolkien over and over, forever? Don't you think that'd get boring no matter how good he was?

And empirically, "new creative [people]" manage to complete with Pratchett or Tolkien all the time, as new fantasy works are still being published and read. Do you remember that "Game of Thrones" was a mass cultural phenomenon not too long ago?

replies(1): >>43967772 #
314. Workaccount2 ◴[] No.43967046{5}[source]
Youtube built probably the most complex and proactive copyright system any organization has ever seen, for the sole purpose of appeasing copyright holders. There is no reason to believe they won't do the same thing for LLM output.
315. palmotea ◴[] No.43967100{5}[source]
> That's also why I'm really not worried about the "AI singularity" folks. The hype is IMO blatantly unsubstantiated by the actual capabilities, but gets pushed anyway only because it speaks to this deep-seated faith held across the industry. "AI" is the culmination of an innate belief that people should be replaceable, fungible, perfectly obedient objects, and such a psychosis blinds decision-makers to its actual limits. Only trouble is whether they have the political power to try to force it anyway.

I'm worried because decision-makers genuinely don't seem to be bothered very much by actual capabilities, and are perfectly happy to trade massive reductions in quality for cost savings. In other worse, I don't think the limits of LLMS will actually constrain the decision-makers.

replies(1): >>43969158 #
316. stevenAthompson ◴[] No.43967107{6}[source]
Thanks for clarifying. Sometimes I forget that HN has a lot experts floating around who take things in a very literal and legalistic way. I was speaking in more general terms, and missed that you were being very precise with your language.

Compulsory licenses are interesting aren't they? It just feels wrong. If Metallica doesn't want me to butcher their songs, why should the be forced to allow it?

replies(2): >>43967433 #>>43967596 #
317. gruez ◴[] No.43967115{4}[source]
Addressed this in another comment: https://news.ycombinator.com/item?id=43966888
318. KoolKat23 ◴[] No.43967121[source]
"But making commercial use of vast troves of copyrighted works to produce expressive content that competes with them in existing markets, especially where this is accomplished through illegal access, goes beyond established fair use boundaries."

I honestly can't see how this directly addresses fair use, it's a odd sweeping statement. It implies inventing something that borrows little from many different copyrighted items is somehow not fair use? If it was one for one yes, but it's not it's basically saying creativity is not fair use. If it's not saying this and refers to competition in the existing market they're making a statement about the public good, not fair use. Basically a matter for legislators and what the purpose of copyright is.

319. gruez ◴[] No.43967161{9}[source]
It's not as clear-cut as you think. The courts have held that both google thumbnails and google books are fair use, even though they're far closer to verbatim copies than an AI model.
replies(1): >>43967616 #
320. sdenton4 ◴[] No.43967423{8}[source]
Nothing in copyright law talks about 'semantic meaning' or 'character of the source material'. Really, quite the opposite - the 'expression-idea dichotomy' says that you're copyrighting the expression of an idea, not the idea itself. https://en.wikipedia.org/wiki/Copyright_law_of_the_United_St...

(Leaving aside whether the weights of an LLM does actually encode the content of any random snippet of training text. Some stuff does get memorized, but how much and how exactly? That's not the point of the LLM, unlike the jpeg or database.)

And, again, look at the search snippets case - these were words produced by other people, directly transcribed, so open-and-shut from a certain point of view. But the decision went the other way.

321. skolskoly ◴[] No.43967433{7}[source]
Any live band performing a song is subject to mechanical licensing as much as a recording artist. Typically the venue pays it, just like how radio stations pay royalties. This system exists because historically, that's how music reproduction worked. You hire some musicians to play the music you want to hear. Copyright applied to the score, the lyrics, and so on. The 'mechanical' rights had to come later, because recording hadn't been invented yet!
322. Suppafly ◴[] No.43967443{3}[source]
>The fatal flaw in your reasoning: machines aren't humans.

I don't see how that affects the argument. The machines are being used by humans. Your argument then boils down to the idea that you can do something manually but it becomes illegal if you use a tool to do it efficiently.

replies(1): >>43967533 #
323. umanwizard ◴[] No.43967451{7}[source]
In the world you’re proposing, you would also not be able to make word-for-word copies of Harry Potter books, because Harry Potter wouldn’t exist.
replies(1): >>43969070 #
324. Suppafly ◴[] No.43967466{8}[source]
>Their weights are derived from copyrighted works. Evaluating them preserves the semantic meaning and character of the source material.

That sounds like you're arguing that they should be legal. Copyright law protects specific expressions, not handwavy "smudgy and non-deterministic" things.

replies(1): >>43969125 #
325. const_cast ◴[] No.43967503{6}[source]
There's nothing special about the US government that makes it uniquely shit.

The difference here is that we have people like yourself: those who have zero faith in our government and as such act as double agents or saboteurs. When people such as yourself gain power in the legislator they "starve the beast". Meaning, purposefully deconstruct sections of our government such that they have justification for their ideological belief that our government doesn't work.

You guys work backwards. The foregone conclusion is that government programs never work, and then you develop convoluted strategies to prove that.

326. Suppafly ◴[] No.43967517{5}[source]
>The person making the argument is capable of distinguishing the two things, they just performatively choose not to do so.

I think that sort of assumption of insincerity is worse than what you're accusing them of. You might not like their argument, but it's not inherently incorrect for them to argue that because humans have the right to do something, humans have the right to use tools to do that something and humans have the right to group together and use those tools to do something at a large scale.

replies(1): >>43973795 #
327. const_cast ◴[] No.43967533{4}[source]
It's not about the tool, how you use it, or even how it works. It's about the end result.

I can go through and manually compress "Revenge of the Sith" and then post it online. Or, I can use a compression program like handbrake. Regardless, it is copyright infringement.

Can AI reproduce almost* the same things that exist in it's training data? Sometimes, so sometimes it's copyright infringement. Doesn't help that it's explicitly for-profit and seeks to obsolesce and siphon value from it's training material.

replies(1): >>43967637 #
328. Suppafly ◴[] No.43967536{9}[source]
>that are also free for people

sounds like a good deal if you're people.

329. Suppafly ◴[] No.43967544{7}[source]
>Under current law OpenAI, Google, etc. have no right to cheap training data, because someone made that data and may have the reasonable interest in getting paid for their efforts.

If it were that cut and dried we wouldn't have this conversation at all, so clearly your position isn't objectively true.

330. const_cast ◴[] No.43967554{9}[source]
The difference is we're humans, so we get special privileges. We made the laws.

If we're going to be giving some rights to LLMs for convenient for-profit ventures, I expect some in-depth analysis on whether that is or is not slavery. You can't just anthropomorphize a computer program when it makes you money but then conveniently ignore the hundreds of years of development of human rights. If that seems silly, then I think LLMs are probably not like humans and the comparisons to human learning aren't justified.

If it's like a human, that makes things very complicated.

331. breakingcups ◴[] No.43967576{10}[source]
> Previously, a Meta executive in charge of project management, Michael Clark, had testified that Meta allegedly modified torrenting settings "so that the smallest amount of seeding possible could occur," which seems to support authors' claims that some seeding occurred. And an internal message from Meta researcher Frank Zhang appeared to show that Meta allegedly tried to conceal the seeding by not using Facebook servers while downloading the dataset to "avoid" the "risk" of anyone "tracing back the seeder/downloader" from Facebook servers. Once this information came to light, authors asked the court for a chance to depose Meta executives again, alleging that new facts "contradict prior deposition testimony."
replies(1): >>43967762 #
332. LexiMax ◴[] No.43967589{3}[source]
I think most people have lost their minds over the hypocrisy. For decades people have been raked over the coals for piracy, but now suddenly piracy is okay if your name is Facebook and you're building an AI model.

Either force AI companies to compensate the artists they're being "inspired" by, or let people torrent a copywashed Toy Story 5.

333. toast0 ◴[] No.43967596{7}[source]
They are very interesting. IMHO, it's a nice compromise between making sure the artists are paid for their work, and giving them complete control over their work. Licensing for radio-style play is also compulsory, and terrestrial radio used to not even have to pay the recording artists (I think this changed?), but did have to track and pay to ASCAP.

As a consumer, it would amazing if there were compulsory licenses for film and tv; then we wouldn't have to subscribe to 70 different services to get to the things we want to see. And there would likely be services that spring up to redistribute media where the rightsholders aren't able to or don't care to; it might be pulled from VHS that fans recorded off of TV in the old days, but at least it'd be something.

334. const_cast ◴[] No.43967616{10}[source]
The reason those are allowed is because they don't compete with the source material. A thumbnail of a movie is never a substitute for a movie.

LLMs seek to be a for-profit replacement for a variety of paid sources. They say "hey, you can get the same thing as Service X for less money with us!"

That's a problem, regardless of how you go about it. It's probably fine if I watch a movie with my friends, who cares. But distributing it over the internet for free is a different issue.

replies(1): >>43967807 #
335. Suppafly ◴[] No.43967637{5}[source]
>Sometimes, so sometimes it's copyright infringement.

So in those cases, the original authors might have a case. Generally you don't see these LLM doing that though.

>Doesn't help that it's explicitly for-profit and seeks to obsolesce and siphon value from it's training material.

Doesn't hurt either. That's a reason to be butthurt, but that's not a legal argument.

replies(1): >>43967723 #
336. Popeyes ◴[] No.43967646[source]
Maybe we should review copyright and the length of it.
337. const_cast ◴[] No.43967723{6}[source]
> That's a reason to be butthurt, but that's not a legal argument.

It is a legal argument, fair use specifically takes into account the intention. Just using it for commercial ventures makes the water hotter.

replies(1): >>43976273 #
338. gruez ◴[] No.43967762{11}[source]
>Meta allegedly modified torrenting settings "so that the smallest amount of seeding possible could occur,"

>Meta allegedly tried to conceal the seeding by not using Facebook servers while downloading the dataset to "avoid" the "risk" of anyone "tracing back the seeder/downloader" from Facebook servers

Sounds like they used a VPN, set the upload speed to 1kb/s and stopped after the download is done. If the average Joe copied that setup there's 0% chance he'd get sued, so I don't really see a double standard here. If anything, Meta might get additional scrutiny because they're big enough of a target that rights holders will go through the effort of suing them.

replies(1): >>43969185 #
339. ben_w ◴[] No.43967772{9}[source]
> A word without copyright is one where some publisher still makes money, but it's a race to the bottom for authors.

This is the case anyway; there are many writers competing for the opportunity to be published, so the publishers have a massive advantage, and it is the technology of printing (and cheap paper) that makes this a one-sided relationship — if every story teller had to be heard in person, with no recordings or reproductions possible, then story tellers would be found in every community, and they would be valued by their community.

> Here's a hole in your thinking: if you like fantasy, would you be content to just re-read Tolkien over and over, forever? Don't you think that'd get boring no matter how good he was?

The examples aren't meant to be exclusive, and Pratchett has a lot of books.

There's far more books on the market right now than a human can read in a lifetime. At some point, we may have already passed it, there will be far more good books on the market than a human can read in a lifetime, at which point it's not quality, it's fashion.

> And empirically, "new creative [people]" manage to complete with Pratchett or Tolkien all the time, as new fantasy works are still being published and read.

At some point, there will be more books at least as good as Pratchett, Tolkien, Le Guin, McCaffrey, Martin, Heinlein, Niven etc. in each genre, than anyone can read.

> Do you remember that "Game of Thrones" was a mass cultural phenomenon not too long ago?

Published: August 1, 1996 — concurrently with Pratchett.

Better example would have been The Expanse — worth noting that SciFi has a natural advantage over (high) fantasy or romance, as the nature of speculative science fiction means it keeps considering futures that are rendered as obsolete as the worn-down buttons on the calculator that Hari Seldon was rumoured to keep under his pillow.

340. gruez ◴[] No.43967807{11}[source]
>The reason those are allowed is because they don't compete with the source material. A thumbnail of a movie is never a substitute for a movie.

>LLMs seek to be a for-profit replacement for a variety of paid sources. They say "hey, you can get the same thing as Service X for less money with us!"

What's an LLM supposed to be a substitute for? Are people using them to generate entire books or news articles, rather than buying a book or an issue of the new york times? Same goes for movies. No one is substituting marvel movies with sora video.

replies(1): >>43967863 #
341. const_cast ◴[] No.43967863{12}[source]
> Are people using them to generate entire books or news articles, rather than buying a book or an issue of the new york times?

Yes.

> No one is substituting marvel movies with sora video.

Yeah because sora kind of sucks. It's great technology, but turns out text is just a little bit easier to generate than 3D videos.

Once sora gets good, you bet your ass they will.

342. TiredOfLife ◴[] No.43967996{4}[source]
I think most people who had even basic understanding how AI works would disagree with you.
343. rubslopes ◴[] No.43968036{7}[source]
Another example: Microsoft suing pirated Windows distributors.
344. wnevets ◴[] No.43968192{13}[source]
> . If we're talking about the views of the copyright office and how that relates to artists, it's implicit that we're staying in copyright land, because there has never been a case about style-as-IP in visual art.

This article is literally about the copyright office finding AI companies violating copyright law by training their models on copyrighted material. I'm not even sure what you're arguing about anymore.

replies(1): >>43969060 #
345. kevlened ◴[] No.43968228{6}[source]
This critique deserves more attention.

Humans are capable of reproducing copyright illegally, but we allow them to train on copyrighted material legally.

Perhaps measures should be taken to prevent illegal reproduction, but if that's impossible, or too onerous, there should be utilitarian considerations.

Then the crux becomes a debate over utility, which often becomes a religious debate.

346. achrono ◴[] No.43968263{3}[source]
Let me answer those questions with actual evidence.

To begin with, this very case of Perlmutter getting fired after her office's report is interesting enough, but let's keep it aside. [0]

First, plenty of lobbying has been afoot, pushing DC to allow training on this data to continue. No intention to stop or change course. [1]

Next, when regulatory attempts were in fact made to act against this open theft, those proposed rules were conveniently watered down by Google, Microsoft, Meta, OpenAI and the US government lobbying against the copyright & other provisions. [2]

If you still think, "so what? maybe by strict legal interpretation it's still fair use" -- then explain why OpenAI is selectively signing deals with the likes of Conde Nast if they truly believe this to be the case. [3]

Lastly, when did you last see any US entity or person face no punitive action whatsoever despite illegally downloading (and uploading) millions of books & journal articles; do you remember Aaron Swartz? [4]

You might not agree with my assessment of 'conspiracy', but are you denying there is even an alignment of incentives contrary to the spirit of the law?

[0] https://www.reuters.com/legal/government/trump-fires-head-us...

[1] https://techcrunch.com/2025/03/13/openai-calls-for-u-s-gover...

[2] https://www.euronews.com/next/2025/04/30/big-tech-watered-do...

[3] https://www.reuters.com/technology/openai-signs-deal-with-co...

[4] https://cybernews.com/tech/meta-leeched-82-terabytes-of-pira...

347. encipriano ◴[] No.43968362{4}[source]
Why would u have guilt of using an unlimited resource? Youre not stealing
348. kbelder ◴[] No.43968544{6}[source]
If they were a database, they would be unquestionably legal, because they're only storing a tiny fraction of one percent of the data from any document, and even that data is not any particular replica of any part of the document, but highly summarized and transformed.
replies(1): >>43969148 #
349. johnnyanmac ◴[] No.43968990{4}[source]
We regulate it like how we did centuries ago that lead to copyright. If we already have rules we enforce it. If no one in power wants to, we put in people who will.

In the end this all comes down to needing the people to care enough.

replies(1): >>43976413 #
350. johnnyanmac ◴[] No.43968993{3}[source]
This is pre iselt why we need proportional fees for courts. We can't just let companies treat the law as a cost benefits analysis. They should live in fear of a court result against their favor.
351. johnnyanmac ◴[] No.43969002{4}[source]
Roads and telecommunication. You can argue they are indeed a dumpster fire, but imagine the alternatives full of tolls and incompatible wavelengths.
352. johnnyanmac ◴[] No.43969019{4}[source]
Because they bought code from someone not authorized to sell it?

This isn't some new phenomenon. We do indeed seize assets from buyers if the seller stole them.

353. EMIRELADERO ◴[] No.43969060{14}[source]
The Copyright Office is not an authority in this context, it's just an opinion. They did not make any "finding". To a judge they may as well be any other amicus curiae.

My opinion on the matter at hand is this: Artists who complain about GenAI use the hypothetical that you mentioned, where if you can accurately recreate a copyrighted work through specific model usage, then any distribution of the model is a copyright violation. That's why, according to the argument, fair use does not apply.

The real problem with that is that there's a mismatch between the fair use analysis and the actual use at issue. The complaining artists want the fair use inquiry to focus on the damage to the potential market to works in their particular style. That's where the harm is according to them. However, what they use to even get into that stage is the copyright infringement allegation that I described earlier: that the models contain their works on a fixed manner which can be derived without permission.

Not to mention the fact that this position means putting the malicious usage of the models for outright copyright infringement at the output level above the entire class of new works that can be created by its usage. It's effectively saying "because these models can technically be used in an infringing way, it infringes our copyright and any creative potential that these models could help with are insignificant in comparison to that simple fact. Of course, that's not the actual real problem, which is that they output completely new works that compete with our originals, even when they aren't derivatives of, nor substantially similar to, any individual copyrighted work".

Here's a very good article outlining my position in a more articulate way: https://andymasley.substack.com/p/a-defense-of-ai-art

354. johnnyanmac ◴[] No.43969065{4}[source]
> art is derivative in some sense, it's almost always just a matter of degree.

Yes, that's why we judge on a case by case basis. The line is blurry.

I think when you're storing copies of such assets in your database that you're well past the line, though.

355. 93po ◴[] No.43969070{8}[source]
why not? people write fiction all the time and put it on the internet for free. in fact, i'd say there's significantly more unpaid fiction writing in the world than paid.
replies(2): >>43969731 #>>43969818 #
356. johnnyanmac ◴[] No.43969091{6}[source]
> Copyright infringement is not stealing

If we can agree that taking away of your time is theft (wage theft, to be precise), we as those who rely on intellect in our careers should be able to agree that the taking of our ideas is also theft.

>moved to the Ninth Circuit Court of Appeals, where he argued that the goods he was distributing were not "stolen, converted or taken by fraud", according to the language of 18 U.S.C. 2314 - the interstate transportation statute under which he was convicted. The court disagreed, affirming the original decision and upholding the conviction. Dowling then took the case to the Supreme Court, which sided with his argument and reversed the convictions.

This just tells me that the definition is highly contentious. Having the supreme court reverse a federal ruling already shows misalignment.

357. johnnyanmac ◴[] No.43969125{9}[source]
Llms can't express, that's the primary issue. You can't just make a collage of copyrighted works and shield yourself from copyright with "expression".
replies(2): >>43976226 #>>43976269 #
358. johnnyanmac ◴[] No.43969144{9}[source]
Scales of effect always come into play when enacting law. If you spend a day digging a whole on the beach, you're probably not going to incur much wrath. If you bring a crane to the beach, you'll be stopped because we know the hole that can be made will disrupt the natural order. A human can do the same thing eventually, but does it so slowly that it's not an issue to enforce 99.9% of the time.
replies(1): >>43969886 #
359. johnnyanmac ◴[] No.43969148{7}[source]
Given that you can in fact prompt enough to reproduce a source image, I'm not convinced that is the actual truth of the matter.
360. nickpsecurity ◴[] No.43969157{6}[source]
That's just the reproducing part. They also shared copies of scraped web sites, etc without the authors' permission. Unauthorized copying has been widely known to be illegal for a long time. They've already broken the law before the training process even begins.
361. johnnyanmac ◴[] No.43969158{6}[source]
It will when it inevitably hits their wallets. Be it via the public rejection of a lower quality product, or court orders. But both sentiments move slow, so we're in here for a while.

Even with NFTs it still was a full year+ of everyone trying to shill them out before the sentiment turned. Machine learning, meanwhile, is actually useful but is being shoved into every hole.

362. FireBeyond ◴[] No.43969185{12}[source]
> If the average Joe copied that setup there's 0% chance he'd get sued

Citation needed. RIAA used to just watch torrents and sent cease and desists to everyone who connected, whether for a minute or for months. It was very much a dragnet, and I highly doubt there was any nuance of "but Your Honor, I only seeded 1MB back so it's all good".

replies(1): >>43972516 #
363. nickpsecurity ◴[] No.43969193{8}[source]
Whereas, my report showed they were breaking copyright before the training process. Meta was sued for what I said they'd be sued for, too.

Like Napster et al, their data sets make copies of hundreds of GB of copyrighted works without authors' permission. Ex: The Pile, Commons Crawl, Refined Web, Github Pages. Many copyrighted works on the Internet also have strict terms of use. Some have copyright licenses that say personal use only or non-commercial use.

So, like many prior cases, just posting what isn't yours on HughingFace is already infringement. Copying it from HF to your training cluster is also infringement. It's already illegal until we get laws like Singapore's that allow copyrighted works. Even they have a weakness in the access requirement which might require following terms of use or licenses in the sources.

Only safe routes are public domain, permissive code, and explicit licenses from copyright holders (or those with sub-license permissions).

So, what do you think about the argument that making copies of copyrighted works violates copyright law? That these data sets are themselves copyright violations?

364. nickpsecurity ◴[] No.43969201{5}[source]
I won't say it's irrelevant. How much you use is part of fair use considerations. Their huge collections of copyrighted works make them look worse in legal analyses.
365. CaptainFever ◴[] No.43969223{3}[source]
IP minimalism is IP minimalism, regardless of who owns the IP.
366. mr_toad ◴[] No.43969383{3}[source]
> It's the kind of thing that people would have prevented if it had occurred to them, by writing terms of use that explicitly forbid it.

The AI companies will likely be arguing that they don’t need a license, so any terms of use in the license are irrelevant.

367. mr_toad ◴[] No.43969413{5}[source]
The model is not compressed data, it’s the compression algorithm. The prompt is compressed data. When you feed it a prompt it produces the uncompressed result (usually with some loss). This is not an analogy by the way, it’s a mathematical equivalence.

You can try and argue that a compression algorithm is some kind of copy of the training data, but that’s an untested legal theory.

368. mr_toad ◴[] No.43969540{3}[source]
But AI is mostly scale and only a little bit innovation. It’s undergraduate maths and a whole lot of computing power and data. Not being able to train on data on the internet would be a significant handicap.
369. mr_toad ◴[] No.43969560{4}[source]
Patents are not the same thing as copyright.
370. otterley ◴[] No.43969731{9}[source]
People don't copy amateur fiction they can find for free. They copy (or rather, make derivative works of) successful commercial content because it is successful and well known.
371. umanwizard ◴[] No.43969818{9}[source]
Yes, and most of it is awful, whereas Joanne Rowling is talented.

It’s very unlikely that she would (or even could) have devoted herself to writing fiction in her free time as a passion project without hope of monetary reward and without any way to live from her writing for the ten years it took to finish the Potter series.

And even if she had somehow managed, you’d never hear about it, because without publishers to act as gatekeepers it’d have been lost in the mountains of fanfic and whatever other slop amateur writers upload to the internet.

replies(1): >>43978679 #
372. SilasX ◴[] No.43969886{10}[source]
That's just the usual hand-wavy, vague "it's different" argument. If you want to justify treating the cases differently based on a fundamental difference, you need to be more specific. For example, they usually define an amount of rainwater you can collect that's short of disrupting major water flows.

So what is the equivalent of "digging too much" in a beach for AI? What fundamentally changes when you learn hyper-fast vs just read a bunch of horror novels to inform better horror novel-writing? What's unfair about AI compared to learning from published novels about how to properly pace your story?

These are the things you need to figure out before making a post equating AI learning with copyright infringement. "It's different" doesn't cut it.

373. arp242 ◴[] No.43969913[source]
I get what you're saying, but this is just a race to the bottom, no?

It's annoying to see the current pushback against China focusing so much on inconsequential matters with so much nonsense mixed in, because I do think we do need to push back against China on some things.

374. seanmcdirmid ◴[] No.43969949{3}[source]
In the long run private IP will eventually become very public despite laws you have, it’s been like that since the Stone Age. The American Industrial Revolution was built partially on stolen IP from Britain. The internet has just sped up diffusion. You can stop it if you are willing to cut the line, but legal action is only some friction and even then only in the short term
375. gruez ◴[] No.43972516{13}[source]
Did you miss the part about using a VPN?
376. staticman2 ◴[] No.43973795{6}[source]
Anyone writing "humans can learn from art why can't machines" or something to that effect is performatively conflating an organism and a machine.

My issue is with the rhetoric, if that isn't the rhetoric you are using I am not talking about you.

replies(1): >>43976263 #
377. slipnslider ◴[] No.43974206{4}[source]
Einstein once said "the key to genius is to hide your sources well"

And honestly there is truth to it. Some people (at work, in rea life, wherever) might come off very inteligent but the moment they say "oh I just read that relevant fact on reddit/twitter/news site 5 minutes ago" you realize they are just like you and repeating relevant information that was consumed recently.

378. 1vuio0pswjnm7 ◴[] No.43974233[source]
The design, manufacture and supply of electronics is far more important than one particular usage, e.g, "LLMs". It has never been a requirement to violate copyrights to produce electronics, or computer software. In fact, arguably there would be no "MicroSoft" were it not for Gates' lobbying for the existence and enforcement of "software copyright". The "Windows" franchise, among others, relies on it. The irony of Microsoft's support for OpenAI is amusing. Copyright enforcement for me but not for thee.
379. noirscape ◴[] No.43974756{9}[source]
I did and they aren't convincing. The first is an argument of how a popular interpretation of a work still under copyright can subsume the fact the original work is in the public domain, using Alice in Wonderland as an example. (I also happen to think it's a particularly terrible example - if you want to make this argument, The Little Mermaid is by far the stronger version of this argument.) It also misidentifies Disney as the copyright boogeyman, which is a pretty common categorical error. (Disney had very little to do with the length of US copyright. The length of copyright is pretty much entirely the product of geopolitics and international agreements, not Disney.) Its an interesting argument, but not one I find particularly convincing for abolishing copyright, at most shortening the length of it. (Which I do believe is needed.)

The second one is the "just solve capitalism and we can abolish copyright entirely" argument which is... a total non-starter. Yes, in an idealized utopia, we don't need capitalism or copyright and people can do things just because they want to and society provides for the artist just because humans all value art just that much. It's a fun utopic ideal, but there's many steps between the current state of the world and "you can abolish the idea of copyright", and we aren't even close to that state yet.

380. dmonitor ◴[] No.43975280{9}[source]
Copyright does discourage DRM. Even the most egregious DRM these days can be bypassed with minimal effort and is mostly just a nuisance. Take away government enforcement of copyright and how profitable your digital product is will be directly tied to how advanced you are in the DRM arms race.

Steam is the classic example of how this is effective. You compete with pirates by offering what they can't: a reliable, convenient service. DRM becomes more of a hindrance than a benefit in this situation.

Allowing pirates to offer reliable convenient pirate websites that are "so easy a normie can do it" would be a disaster for all the creative industries. You would need to radically change the rest of society to prevent a total collapse of people making money off art.

381. sdenton4 ◴[] No.43976226{10}[source]
That's certainly an opinion.
382. Suppafly ◴[] No.43976263{7}[source]
My issue is that your rhetoric of "performatively conflating an organism and a machine" doesn't address the core issue of "humans can learn from art why can't machines". You're essentially saying that you don't like the question so you're refusing to answer it. There is nothing inherently wrong with training machines on existing data, if you want us to believe there is, you need to have some argument about what that would be the case.

Is your argument simply about your interpretation of copyright law and your mentality being that laws are good and breaking them is bad? Because that doesn't seem to be a very informed position to take.

replies(1): >>43976804 #
383. Suppafly ◴[] No.43976269{10}[source]
>You can't just make a collage of copyrighted works and shield yourself from copyright with "expression".

And yet collage artists do that all the time.

replies(1): >>43982167 #
384. Suppafly ◴[] No.43976273{7}[source]
>It is a legal argument

Not a very good one then.

385. rollcat ◴[] No.43976413{5}[source]
Disney continued to lobby to extend copyright for like half a century. A lot of people did care. What use is regulation if you can just buy it?
replies(1): >>43982182 #
386. p0w3n3d ◴[] No.43976732[source]
it's funny how a law becomes potentially-outdated only when big corporations want to violate in on a global scale.

As a private person I no longer feel incentivised to create new content online because I think that all I create will eventually be stolen from me...

387. staticman2 ◴[] No.43976804{8}[source]
My stated opinion is anyone who comes to an AI conversation and says "I can't tell the difference between organisms and computers" or some variation thereof does in fact have no trouble in practice distinguishing between between their child/ mom/ dad/ BFF and ChatGPT as is in fact questioning from a position of bad faith.

"There is nothing inherently wrong with training machines on existing data..." doesn't really conflate a machine with an organism and isn't what I'm talking about.

If you instead had written "I can read the Cat in the Hat to teach my kid to read why can't I use it to train an LLM?"

Then I do think you would be asking with a certain degree of bad faith, you are perfectly capable of distinguishing those two things, in practice, in your everyday life. You do not in fact see them as equivilent.

Your rhetorical choice to be unable to tell the difference would be performative.

You seem to think I'm arguing copyright policy. I really am discussing rhetoric.

388. 93po ◴[] No.43978679{10}[source]
Most is awful, but I'd still say there's just as much good unpaid fiction as paid fiction. Lots of paid fiction is also really, really bad.
replies(1): >>43978723 #
389. umanwizard ◴[] No.43978723{11}[source]
Ok, what are some examples of high-quality literary fiction published for free?
replies(1): >>43998026 #
390. johnnyanmac ◴[] No.43982167{11}[source]
I'll remind you that all fanart is technically in a gray area of copyright infringement. Legally speaking, companies can take down and charge infringement for anything using their IP thars not under fair use. Collages don't really pass that benchmark.

Yoinnking their up and mass producing slop sure is a line to cross, though.

replies(1): >>43984849 #
391. johnnyanmac ◴[] No.43982182{6}[source]
>a lot of people did care

As did Disney, apparently.

>what use is regulation if you can just buy it?

I don't like it either, but it still comes down to the same issues. We vote in people who can be bought and don't make a scandal out of it when it happens. The first step to fixing that corruption is to make congress afraid of being ousted if discovered. With today's communication structure, that's easier than ever.

But if the people don't care, we see the obvious Victor.

392. zelphirkalt ◴[] No.43983233{6}[source]
I still feel like the point is useless, because at the end of the day, if some normal person went ahead and did the same thing the tech giant did, they would long be moved to a less comfortable new home, that has high security against breaking in. At the end of the day, the situation now is, that some are more equal than others, and it is unacceptable, yet, due to the mountains of (also unethically acquired) cash they have, they can get away with something a normal person cannot. Even the law might be bent to their will, because if suing them fails, it creates precedence.

If we end up saying it is not illegal, then I demand, that it will not be illegal for everyone. No double standards please. Let us all launder copyrighted material this way, labeling it "AI".

393. temporalparts ◴[] No.43984849{12}[source]
I'm not an expert, but I thought fan art that people try to monetize in some form is explicitly illegal unless it's protected by parody, and any non commercial "violations" of copyright is totally legal. Disney can't stop me from drawing Mickey in the privacy of my own house, just monetizing/getting famous off of them.
394. 93po ◴[] No.43998026{12}[source]
i could give examples of both paid and unpaid and have them shot down as "this is crap writing". instead i will simply point out that there is very popular unpaid fiction on the internet, and its popularity is indicative of its quality, even if it doesn't match the standards of a literature PhD for "good writing". so basically go look for the most popular unpaid fiction online and there's your answer. i mean all of this conversationally and kindly, if my tone feels patronizing at all.
replies(2): >>43998947 #>>44001090 #
395. otterley ◴[] No.43998947{13}[source]
I think some examples would be helpful that support your argument, along with popularity metrics for these.
396. umanwizard ◴[] No.44001090{13}[source]
I specified "literary fiction" intentionally, because I suspected it would be the hardest kind for you to find, and that good genre fiction (sci-fi, mystery, romance, etc.) would be somewhat more likely (though still unlikely) to be available for free. But you seem to have ignored that stipulation and steered us back to just talking about fiction in general, and also using popularity as a benchmark for quality...

> its popularity is indicative of its quality, even if it doesn't match the standards of a literature PhD for "good writing"

This is a false dichotomy. Literature PhDs are not the only people out there who enjoy high-quality literature more than light entertainment, and anyway, you seem to be admitting that there's a type of fiction that doesn't exist unpaid, so isn't this just proving my point correct?

All that said, even if I accept for the sake of argument that the existence of popular free genre fiction would be enough to prove your point (because, in fairness to you, we were originally talking about Harry Potter, which is as genre as it gets)... I went looking, and there are at most a few sporadic examples. A few minutes of research suggest that some books by Cory Doctorow are among the most popular ones. Also, The Martian by Andy Weir used to be freely available, but isn't anymore as far as I can find.

Sorry, but Cory Doctorow and (formerly) Andy Weir represent a pretty small body of work compared to the entire canon of paid novels, so I'm going to have to call BS on your claim unless you provide some examples of your own.

replies(1): >>44007585 #
397. 93po ◴[] No.44007585{14}[source]
i didnt respond to the literary part because it's moving the goalposts. i don't care about the literary value of things i read for fun, and most people don't as long as the style and structure of writing doesn't stop them from enjoying it. i never made assertions about "literary" fiction writing, just fiction writing in general
replies(1): >>44011411 #
398. umanwizard ◴[] No.44011411{15}[source]
You didn’t respond to the entire second half of my post.
replies(1): >>44014522 #
399. ◴[] No.44014522{16}[source]