←back to thread

290 points nobody9999 | 6 comments | | HN request time: 0.443s | source | bottom
Show context
jawns ◴[] No.45187038[source]
I'm an author, and I've confirmed that 3 of my books are in the 500K dataset.

Thus, I stand to receive about $9,000 as a result of this settlement.

I think that's fair, considering that two of those books received advances under $20K and never earned out. Also, while I'm sure that Anthropic has benefited from training its models on this dataset, that doesn't necessarily mean that those models are a lasting asset.

replies(22): >>45187319 #>>45187366 #>>45187519 #>>45187839 #>>45188602 #>>45189683 #>>45189684 #>>45190184 #>>45190223 #>>45190237 #>>45190555 #>>45190731 #>>45191633 #>>45192016 #>>45192191 #>>45192348 #>>45192404 #>>45192630 #>>45193043 #>>45195516 #>>45201246 #>>45218895 #
SilasX ◴[] No.45190731[source]
Be careful what you wish for.

While I'm sure it feels good and validating to have this called copyright infringement, and be compensated, it's a mixed blessing at best. Remember, this also means that your works will owe compensation to anyone you "trained" off of. Once we accept that simply "learning from previous copyrighted works to make new ones" is "infringement", then the onus is on you to establish a clean creation chain, because you'll be vulnerable to the exact same argument, and you will owe compensation to anyone whose work you looked at in learning your craft.

This point was made earlier in this blog post:

https://blog.giovanh.com/blog/2025/04/03/why-training-ai-can...

HN discussion of the post: https://news.ycombinator.com/item?id=43663941

replies(5): >>45190891 #>>45191954 #>>45192010 #>>45192597 #>>45199932 #
brendoelfrendo ◴[] No.45190891[source]
It's a good thing that laws can be different for AI training and human consumption. And I think the blog post you linked makes that argument, too, so I'm not sure why you'd contort it into the idea that humans will be compelled to attribute/license information that has inspired them when creating art.
replies(1): >>45193067 #
SilasX ◴[] No.45193067[source]
Right — laws can be arbitrary, and ignore constraints like consistency! It’s just something sane people try to avoid.
replies(1): >>45194940 #
1. ch_fr ◴[] No.45194940[source]
The inconsistency you're talking about is only based on the premise that LLMs and humans are "basically the same thing and thus should be treated the exact same way in this kind of situation". But I don't really see why that would be the case in the first place.

Now don't take me wrong, I'm not saying that a rushed regulatory response is a good thing, it's more about the delivery of your reply. I see those arguments a lot: people smugly saying "Well, YOU too learn from things, how about that? Not so different from the machine huh?" and then continuing the discussion based on that premise, as if we were supposed to accept it as a fact.

replies(1): >>45198269 #
2. SilasX ◴[] No.45198269[source]
Because the onus is on you to show the substantive difference. Learning from copyrighted works has always been accepted as free and unrestricted, up until 2022. Before that, nobody (to a rounding error) thought that simply being influenced by a previous copyright work meant you owed a license fee. If anything, people would have been livid about restrictions on their right to learn.

Only when big-corp critics needed another pretense to support a conclusion they long agreed with for other reasons, did they decide that the right to learn from your exposure to a copyright work was infringement.

If you're interested in the similarities and genuinely curious, you could look at the article linked above, which shows how both LLMs and humans store a high-level understanding of their training set. It's a way deeper parallel than "it's all learning" -- but you have to be willing to engage with the other side rather than just strawman it.

replies(3): >>45199960 #>>45200226 #>>45203126 #
3. _DeadFred_ ◴[] No.45199960[source]
You left out 'For human beings, not for billion dollar corporate for profit unlimited scaling products intended to replace all creative work'.

If human beings were working for Anthropic, training, and then being contracted out by Anthropic, the exact same rules would apply as historically. Anthropic is not being unfairly treated.

Small detail.

4. freejazz ◴[] No.45200226[source]
> Because the onus is on you to show the substantive difference

It literally is not. If the defense for your copyright infringement is "my machine is actually the same as a human" then it's your obligation to substantiate that argument.

5. ch_fr ◴[] No.45203126[source]
Don't talk about strawmanning when paragraph 2 doesn't describe any argument I've been making. The similarity IS an argument you've advanced. The person you're arguing with does exist and Disney + other bad actors ARE trying to profiteer off of their outrage to tighten up copyright... but I'm not that guy, I really only attacked your argument.

Either way, you seem to be asking for a genuine conversation so I'll take this a bit more seriously.

I hope you will forgive me for not engaging with the entire article top-to-bottom right now. For time's sake, I've queried ChatGPT to extract quotes from the article related to your main point (the similarity between human thinking and LLM predictions) so that I can ctrl+F them. This is a well written and organized article, so I believe that even looking at disconnected sections should give me a clear view of the argument that's being made.

---

From the section: “Understanding” in tools.

The author rightfully spends some time disambiguating his use of "understanding", and comparing traditional scripting to LLMs, drawing a parallel with human processes like mathematics and intuition (respectively). The section ends with this quote:

> That’s why I think that the process of training really is, both mechanically and philosophically, more like human learning than anything else

Which is the meat of the argument you're making to say that both should be evaluated in the court.

I can easily tell when reading this that the author is skillful and genuine in his concerns over IP and its misuses, and I really respect that. The issue I raise with the whole argument however did not change from my initial response. While he does have a good understanding of how LLMs operate.

↑↑↑↑↑↑ At this point in writing my reply, I then planned to call into questions his credentials in the other main expertise, namely brain science, as I most often saw this argument come from tech people and less so brain scientists. What I found instead was not the ultimate own I hoped for, but rather a mixed bag of things that really are similar[1] and other articles that expressed some big differences in other aspects[2]. As such, I cannot in good faith say that your argument is unsubstantiated, only that brain science (an expertise in which I have absolutely no authority) is still torn on the subject.

---

Doesn't mean my first reply is suddenly null and void. If I can't prove or disprove "LLMs and humans think alike", I can still discuss the other conclusion you (and the article) draw from it "-> ...and thus, should and will be treated equally in the eyes of the law". This brings yet another expertise (law) that I am woefully unqualified to talk about, but I need to ask: why would that be the "only natural" conclusion? I'll refer to your other reply:

> laws can be arbitrary, and ignore constraints like consistency! It’s just something sane people try to avoid.

You look at the inconsistencies in law like they're a design flaw, but are they really? Law is meant to accommodate humans, society is a system with an amount of edge cases that I cannot possibly imagine.

In the very next section of the article called "Training is not copying", he calls out inaccurate uses of the world "reproduction" and "storing", he also cites another article, which I'll quote:

> The complaint against Stable Diffusion characterizes this as “compressing” (and thus storing) the training images, but that’s just wrong. With few exceptions, there is no way to recreate the images used in the model based on the facts about them that are stored. Even the tiniest image file contains many thousands of bytes; most will include millions. Mathematically speaking, Stable Diffusion cannot be storing copies …

This reads to me like arguing semantics, yes, most artists yelling out in outrage do not know the ins and outs of training a diffusion model. But I don't think this completely annihilates or addresses their concerns.

When people say "it's not technically reproduction", it doesn't stop the fact that today, LORAs exists to closely imitate the artstyle with much less training resources, and in the case of LORAs, it's not "a vast, super diluted training set", it's "super fine tuning based on an existing model, and an additional (but much smaller) batch of training data directly taken from (and laser-focused on) a specific person.

Now do I know what would happen if [patreon-having-guy] tries to take someone to justice because he made a LORA to specifically target him? I do not, I haven't checked a legal precedent for this but when there will be, it's a decision that will be taken by humans in a court. (As for what will immediately happen, he will Streisand his way to 27 other dudes doing the same thing out of spite)

I got a bit sidetracked, but all of that is to say, law is by people for people. In the end, there's nothing that tells us whether or not "LLMs and Humans think the same" will directly translate to "...so LLMs shall be treated like humans in the court".

The LLM can't go to prison, it can't make money to pay for damages, having an ironclad rule like that would just make things less convenient. Code is not law, and (thankfully) law is not code. I feel like some people (I'm not saying YOU did it) advocating for "treating LLMs as humans" do so as a means to further alleviate corporate responsibility for anything.

All in all, I'm don't just question the parallel, I question "why" the parallel, "why" in this discussion. For the author of your article, I can easily see that he IS genuine with his concerns about IP and the consequences of a Kneejerk regulatory response to it.

Your initial reply in the context of this thread on the other hand? Correct me if I'm wrong, but it reads like a taunt. It reads like "we'll see who gets the last laugh", so forgive me if I assumed that wrongly, because this was the reason my first reply was the way it was.

---

One last thing that I have to get out of my system (a tiny rant if you will). I feel like there is an attitude problem in the expertise of tech regarding... quite literally any other craft. I suppose it exists in other fields but this is where I see it the most because I'm also in this field.

The topic we've been discussing is an intersection between tech, brain science, and law, a single of those fields is already very hard, you could dedicate your life to it and still learn more things. Yet when it comes to the "LLM = humans" debate, it seems like everyone suddenly has all the qualifications required, never mind that people dedicating their life to brain science are still saying "we don't fully get it", never mind that people who spend their life in law have yet to experience and set a precedent for the whole shift that's going to happen, tech people talk as if tech is the only thing that's needed to make the world turn.

Generative tech has exacerbated (or expanded) this attitude to even more fields, I don't think it's any surprise that there is such animosity between the tech and creative people when the guy that is spearheading generative music says "people don't enjoy making music", when all the communication around it is "adapt or die", "we figured it out", "we SOLVED art", "you will be left behind", and then calling anyone that does not agree a luddite.

The reason I replied is not because I want IP laws to tighten, nor because I genuinely believe we could "get rid of AI" (btw AI is a blanket term that makes things worse for everyone discussing it), you were just unlucky enough to be the n-th person to bring up that argument I've seen many times before on a night where I had some free time.

So thanks for giving me the occasion to write that down. I do not think this threads warrants either of us to show too much hostility, but as you said, the whole conversation about current-day genAI touches on so much more than just genAI, it's very easy to find something about it that annoys someone on either side.

[1] https://www.brown.edu/news/2025-09-04/ai-human-learning

[2] https://www.ox.ac.uk/news/2024-01-03-new-research-shows-way-...

replies(1): >>45222267 #
6. SilasX ◴[] No.45222267{3}[source]
I think you're focusing too much on whether LLMs are "really human-like" or not. That's a distraction, and I shouldn't have made reference to it. Let me zoom out to a broader point:

It has never been a part of copyright to include the right to be influenced by the copyright work. Period. It's been the diametric opposite. Copyright as always existed, and been justified, as a way to get good, new works out into the public, so that later works can be influenced by them. The fact that one work was influenced by another has never, by itself, been a reason to consider it infringement. Not until 2022, when AIs actually got good at it.

When you argue for AI training as copyright infringement, you're saying that "the fact that your work was influenced by previous works, means you owe license fees". This is wholly without precedent[1], and was widely rejected until the moment some activists realized it could be a legal tool against Bad People. It's a Pandora's Box no one really wants (except perhaps very large media companies who will be able to secure general "learning licenses" for mass libraries of works). That was the the point emphasized in my original comment: If Anthropic is infringing because they base new works on old ones, so are you. You too owe licensing fees for every work you observed that fed into how you create. If that feels like an expansion of what copyright is supposed to cover ... that's the point.

For every single work of literature, you can go back and say "aha, this is clearly influenced by X, Y, Z". Precisely zero people were going out insisting that the author therefore owed fees to all those other creators, because the idea is absurd. Or was, until 2022, when some people needed a pretense for a conclusion they long supported for unrelated reasons ("Facebook must suffer"). So I think my second paragraph is justified.

"If you read 100 horror novels and write a new one based on everything you've noticed in them, you don't owe the authors jack squat. But if you have a machine help you compile the insights, suddenly, you've infringed the authors' rights." Yeah, you do need to justify that.

[1] I agree there are going to be cases where it, say, was captured too closely, but not for the general case, and it's further weakened when it's "imitating" a thousand styles at once.