←back to thread

451 points croes | 2 comments | | HN request time: 0.415s | source
Show context
Workaccount2 ◴[] No.43963737[source]
I have yet to see someone explain in detail how transformer model training works (showing they understand the technical nitty gritty and the overall architecture of transformers) and also layout a case for why it is clearly a violation of copyright.

You can find lots of people talking about training, and you can find lots (way more) of people talking about AI training being a violation of copyright, but you can't find anyone talking about both.

Edit: Let me just clarify that I am talking about training, not inference (output).

replies(10): >>43963777 #>>43963792 #>>43963801 #>>43963816 #>>43963830 #>>43963874 #>>43963886 #>>43963955 #>>43964102 #>>43965360 #
jfengel ◴[] No.43963816[source]
I'm not sure I understand your question. It's reasonably clear that transformers get caught reproducing material that they have no right to. The kind of thing that would potentially result in a lawsuit if you did it by hand.

It's less clear whether taking vast amounts of copyrighted material and using it to generate other things rises to the level of copyright violation or not. It's the kind of thing that people would have prevented if it had occurred to them, by writing terms of use that explicitly forbid it. (Which probably means that the Web becomes a much smaller place.)

Your comment seems to suggest that writers and artists have absolutely no conceivable stake in products derived from their work, and that it's purely a misunderstanding on their part. But I'm both a computer scientist and an artist and I don't see how you could reach that conclusion. If my work is not relevant then leave it out.

replies(4): >>43963887 #>>43963911 #>>43964402 #>>43969383 #
1. tensor ◴[] No.43964402[source]
If I write a math book, and you read it, then tell someone about the math within it. You are not violating copyright. In fact, you could write your OWN math book, or history book, or whatever, and as long as you're not copying my actual text, you are not violating copyright.

However, when an LLM does the same, people now what it to be illegal. It seems pretty straightforward to apply existing copyright law to LLMs in the same way we apply them to humans. If the actual text they generate is substantially similar to a source material that it would constitute a copyright violation if a human were to have done it, then it should be illegal. Otherwise it should not.

edit: and in fact it's not even whether an LLM reproduces text, it's wether someone subsequently publishes that text. The person publishing that text should be the one taking on the legal hit.

replies(1): >>43965025 #
2. rrook ◴[] No.43965025[source]
That mathematical formulas already cannot be copyrighted makes this a kinda nonsense example?