←back to thread

728 points freetonik | 4 comments | | HN request time: 0.001s | source
Show context
jedbrown ◴[] No.44980180[source]
Provenance matters. An LLM cannot certify a Developer Certificate of Origin (https://en.wikipedia.org/wiki/Developer_Certificate_of_Origi...) and a developer of integrity cannot certify the DCO for code emitted by an LLM, certainly not an LLM trained on code of unknown provenance. It is well-known that LLMs sometimes produce verbatim or near-verbatim copies of their training data, most of which cannot be used without attribution (and may have more onerous license requirements). It is also well-known that they don't "understand" semantics: they never make changes for the right reason.

We don't yet know how courts will rule on cases like Does v Github (https://githubcopilotlitigation.com/case-updates.html). LLM-based systems are not even capable of practicing clean-room design (https://en.wikipedia.org/wiki/Clean_room_design). For a maintainer to accept code generated by an LLM is to put the entire community at risk, as well as to endorse a power structure that mocks consent.

replies(5): >>44980234 #>>44980300 #>>44980455 #>>44982369 #>>44990599 #
raggi ◴[] No.44980300[source]
For a large LLM I think the science in the end will demonstrate that verbatim reproduction is not coming from verbatim recording, as the structure really isn’t setup that way in the models under question here.

This is similar to the ruling by Alsup in the Anthropic books case that the training is “exceedingly transformative”. I would expect a reinterpretation or disagreement on this front from another case to be both problematic and likely eventually overturned.

I don’t actually think provenance is a problem on the axis you suggest if Alsups ruling holds. That said I don’t think that’s the only copyright issue afoot - the copyright office writing on copyrightability of outputs from the machine essentially requires that the output fails the Feist tests for human copyrightability.

More interesting to me is how this might realign the notion of copyrightability of human works further as time goes on, moving from every trivial derivative bit of trash potentially being copyrightable to some stronger notion of, to follow the feist test, independence and creativity. Further it raises a fairly immediate question in an open source setting if many individual small patch contributions themselves actually even pass those tests - they may well not, although the general guidance is to set the bar low - but is a typo fix either? There is so far to go on this rabbit hole.

replies(4): >>44980456 #>>44980801 #>>44981672 #>>44982112 #
strogonoff ◴[] No.44980801[source]
In the West you are free to make something that everyone thinks is a “derivative piece of trash” and still call it yours; and sometimes it will turn out to be a hit because, well, it turns out that in real life no one can reliably tell what is and what isn’t trash[0]—if it was possible, art as we know it would not exist. Sometimes what is trash to you is a cult experimental track to me, because people are different.

On that note, I am not sure why creators in so many industries are sitting around while they are being more or less ripped off by massive corporations, when music has got it right.

— Do you want to make a cover song? Go ahead. You can even copyright it! The original composer still gets paid.

— Do you want to make a transformative derivative work (change the composition, really alter the style, edit the lyrics)? Go ahead, just damn better make sure you license it first. …and you can copyright your derivative work, too. …and the original composer still gets credit in your copyright.

The current wave of LLM-induced AI hype really made the tech crowd bend itself in knots trying to paint this as an unsolvable problem that requires IP abuse, or not a problem because it’s all mostly “derivative bits of trash” (at least the bits they don’t like, anyway), argue in courts how it’s transformative, etc., while the most straightforward solution keeps staring them in the face. The only problem is that this solution does not scale, and if there’s anything the industry in which “Do Things That Don’t Scale” is the title of a hit essay hates then that would be doing things that don’t scale.

[0] It should be clarified that if art is considered (as I do) fundamentally a mechanism of self-expression then there is, of course, no trash and the whole point is moot.

replies(1): >>44981703 #
0points ◴[] No.44981703[source]
There's an whole genre of musicians focusing only on creating royalty free covers of popular songs so the music can be used in suggestive ways while avoiding royalties.

It's not art. It's parasitism of art.

replies(2): >>44982176 #>>44982746 #
1. withinboredom ◴[] No.44982746[source]
There's several sides of music copyright:

1. The lyrics

2. The composition

3. The recording

These can all be owned by different people or the same person. The "royalty free covers" you mention are people abusing the rights of one of those. They're not avoiding royalties, they just havn't been caught yet.

replies(1): >>44983639 #
2. strogonoff ◴[] No.44983639[source]
I believe performance of a cover still results in relevant royalties paid to the original songwriter, just sans the performance fee, which does not strike me as a terrible ripoff (after all, a cover did take effort to arrange and perform).
replies(1): >>44984639 #
3. withinboredom ◴[] No.44984639[source]
What this person is talking about is they write “tvinkle tvinkle ittle stawr” instead of the real lyrics (basically just writing the words phonetically and/or misspelled) to try and bypass the law through “technicalities” that wouldn’t stand up in court.
replies(1): >>44984699 #
4. strogonoff ◴[] No.44984699{3}[source]
I doubt so for a few reasons based on how they described this alleged parasitic activity, but mainly because the commenter alluded to Spotify doing this. Would be very surprising if they decided to do something so blatantly illegal, when they could keep extracting money by the truckload with their regular shady shenanigans that do not cross that legality line so obviously.

Regarding what you described, I don’t think I encountered this in the wild enough to remember. IANAL but if not cleared/registered properly as a cover it doesn’t seem to be a workaround or abuse, but would probably be found straight up illegal if the rights holder or relevant rights organization cares to sue. In this case, all I can say is “yes, some people do illegal stuff”. The system largely works.