←back to thread

313 points rntn | 1 comments | | HN request time: 0s | source
Show context
ankit219 ◴[] No.44608660[source]
Not just Meta, 40 EU companies urged EU to postpone roll out of the ai act by two years due to it's unclear nature. This code of practice is voluntary and goes beyond what is in the act itself. EU published it in a way to say that there would be less scrutiny if you voluntarily sign up for this code of practice. Meta would anyway face scrutiny on all ends, so does not seem to a plausible case to sign something voluntary.

One of the key aspects of the act is how a model provider is responsible if the downstream partners misuse it in any way. For open source, it's a very hard requirement[1].

> GPAI model providers need to establish reasonable copyright measures to mitigate the risk that a downstream system or application into which a model is integrated generates copyright-infringing outputs, including through avoiding overfitting of their GPAI model. Where a GPAI model is provided to another entity, providers are encouraged to make the conclusion or validity of the contractual provision of the model dependent upon a promise of that entity to take appropriate measures to avoid the repeated generation of output that is identical or recognisably similar to protected works.

[1] https://www.lw.com/en/insights/2024/11/european-commission-r...

replies(7): >>44610592 #>>44610641 #>>44610669 #>>44611112 #>>44612330 #>>44613357 #>>44617228 #
zizee ◴[] No.44611112[source]
It doesn't seem unreasonable. If you train a model that can reliably reproduce thousands/millions of copyrighted works, you shouldn't be distributibg it. If it were just regular software that had that capability, would it be allowed? Just because it's a fancy Ai model it is ok?
replies(2): >>44611371 #>>44611463 #
CamperBob2 ◴[] No.44611371[source]
I have a Xerox machine that can reliably reproduce copyrighted works. Is that a problem, too?

Blaming tools for the actions of their users is stupid.

replies(4): >>44611396 #>>44611501 #>>44612409 #>>44614295 #
saghm ◴[] No.44614295[source]
If I've copied someone else's copyrighted work on my Xerox machine, then give it to you, you can't reproduce the work I copied. If I leave a copy of it in the scanner when I give it to you, that's another story. The issue here isn't the ability of an LLM to produce it when I provide it with the copyrighted work as an input, it's whether or not there's an input baked-in at the time of distribution that gives it the ability to continue producing it even if the person who receives it doesn't have access to the work to provide it in the first place.

To be clear, I don't have any particular insight on whether this is possible right now with LLMs, and I'm not taking a stance on copyright law in general with this comment. I don't think your argument makes sense though because there's a clear technical difference that seems like it would be pretty significant as a matter of law. There are plenty of reasonable arguments against things like the agreement mentioned in the article, but in my opinion, your objection isn't one of the.

replies(1): >>44614740 #
visarga ◴[] No.44614740[source]
You can train a LLM on completely clean data, creative commons and legally licensed text, and at inference time someone will just put a whole article or chapter in the model and has full access to regenerate it however they like.
replies(1): >>44616884 #
1. saghm ◴[] No.44616884[source]
Re-quoting the section the parent comment included from this agreement:

> > GPAI model providers need to establish reasonable copyright measures to mitigate the risk that a downstream system or application into which a model is integrated generates copyright-infringing outputs, including through avoiding overfitting of their GPAI model. Where a GPAI model is provided to another entity, providers are encouraged to make the conclusion or validity of the contractual provision of the model dependent upon a promise of that entity to take appropriate measures to avoid the repeated generation of output that is identical or recognisably similar to protected works.

It sounds to me like an LLM you describe would be covered if they people distributing it put in a clause in the license saying that people can't do that.