Show HN: Llama-dl – high-speed download of LLaMA, Facebook's 65B GPT model

1. EMIRELADERO ◴[05 Mar 23 10:45 UTC] No.35028451[source]▶

I womder, could Facebook take legal action here? While some (most of) the data used to train the model is copyrighted, I don't think the model is. It's the result of a mathematical process applied to a series of facts and works with no more creativity put onto them.

replies(4): >>35028664 #>>35029602 #>>35031189 #>>35033122 #

2. kuroguro ◴[05 Mar 23 11:31 UTC] No.35028664[source]▶

>>35028451 (TP) #

That definition would apply to almost anything software produces ^^;

We can already have different licenses for compiled binaries vs the source. Also the output of ML seems to belong to whoever pressed the generate button atm.

replies(2): >>35029260 #>>35044536 #

3. EMIRELADERO ◴[05 Mar 23 13:02 UTC] No.35029260[source]▶

>>35028664 #

> That definition would apply to almost anything software produces

Not really. The reason software can be copyrighted at all is because the actual code (and resulting object code) is creative. Courts have named this threshold the "Structure, sequence and organization" of the work. ML models don't follow any creative SSO the way actual code does.

> Also the output of ML seems to belong to whoever pressed the generate button atm.

The output, it seems to me, is uncopyrightable. Copyright only cares about who provides the creativity for the work at issue, not who put in the effort to make it happen. You may own the copyright to your prompt, but the result is generated entirely by the AI and thus lacks human autorship.

replies(1): >>35031269 #

4. jeroenhd ◴[05 Mar 23 13:55 UTC] No.35029602[source]▶

>>35028451 (TP) #

As far as my understanding of American copyright goes, a computer produced work cannot be copyrighted as computers are not human, in the same way a photograph taken by a chimp cannot be copyrighted no matter who owned the camera that took the photo. This is one of the major challenges with the legal status of AI as well that will soon be fought over in court.

It's possible that the automated processing of the dataset is considered to be non-creative enough that the generated AI model cannot be copyrighted. The code to train the model and the input dataset (and the works therein) definitely can be, but not the model itself.

In that case, Facebook would be out of luck, as long as the code to train the model isn't shared. If the courts find AI models to be a different type of work that does produce copyrightable models, Facebook may follow in the footsteps of other copyright giants and start filing lawsuits against anyone who they can catch. I very much doubt they'd go so far, especially since by the time they can even start a lawsuit confidently, the leaked model is probably already outdated and irrelevant.

Personally, I expect the model to end up being uncopyrightable, as would be the output of the model.

This may or may not have very interesting results. The dataset itself is probably copyrightable (a human or set of humans composed it, unless that was also done completely automatically) but if that copyright is claimed, the individual right holders of the included works may demand a licensing fee similar to how sound bytes work in music; "you want to use my work, pay me a fee".

Or maybe the dataset is considered to be diverse enough that individual works cannot be expected to be compensated for their inclusion and you can get around copyright law by amassing enough content at once, who knows.

replies(2): >>35029865 #>>35030044 #

5. ◴[05 Mar 23 14:30 UTC] No.35029865[source]▶

>>35029602 #

6. adossi ◴[05 Mar 23 14:51 UTC] No.35030044[source]▶

>>35029602 #

It is intellectual property, regardless of copyright.

replies(2): >>35030290 #>>35030913 #

7. brookst ◴[05 Mar 23 15:17 UTC] No.35030290{3}[source]▶

>>35030044 #

“Intellectual property” is a catch-all for copyright, trademark, patent, and trade secrets. There isn’t really law that protects IP as a general concept, just those four.

8. cma ◴[05 Mar 23 16:14 UTC] No.35030913{3}[source]▶

>>35030044 #

It isn't protected as a trade secret if they mostly freely shared it with .edu addresses. And once it has been leaked out widely publicly it isn't either.

9. digitallyfree ◴[05 Mar 23 16:41 UTC] No.35031189[source]▶

>>35028451 (TP) #

There is another angle here besides copyright and that is the sharing of prop/trade secret data. This model is only available to specific orgs who request it (i.e. it's non-public) and I imagine that there are confidentiality terms for the orgs that get the access.

Not too familiar with the drama but I believe what happened was that someone with access leaked the torrent used to download the weights. In a legal sense this would be similar to someone say leaking a Google Drive link containing prop information that was only intended to be shared with vendors.

replies(1): >>35033302 #

10. AlDante2 ◴[05 Mar 23 16:47 UTC] No.35031269{3}[source]▶

>>35029260 #

I think that copyright law works differently. Source code is copyright; the expression as compiled code from that source enjoys the same protections. If the model can be copyrighted, the expression of the model in the form of its weights is probably also protected.

replies(1): >>35034085 #

11. ◴[05 Mar 23 19:38 UTC] No.35033122[source]▶

>>35028451 (TP) #

12. charcircuit ◴[05 Mar 23 19:57 UTC] No.35033302[source]▶

>>35031189 #

You can read the license at this link.

https://docs.google.com/forms/d/e/1FAIpQLSfqNECQnMkycAp2jP4Z...

There isn't any confidentiality terms.

13. EMIRELADERO ◴[05 Mar 23 21:23 UTC] No.35034085{4}[source]▶

>>35031269 #

You're correct, but that doesn't disprove my point. I'm saying the model itself is uncopyrightable.

14. cthalupa ◴[06 Mar 23 17:48 UTC] No.35044536[source]▶

>>35028664 #

>Also the output of ML seems to belong to whoever pressed the generate button atm.

So far the rulings in the US, at least, do not support this.

https://arstechnica.com/information-technology/2023/02/us-co...

In this case, it was images generated via Midjourney and not the output of an LLM, but my layman's understanding of the result here would be equally applicable to LLM output. Effectively, the copyright office does not consider putting in a prompt enough for there to be "human authorship" of the work. In this specific case, that resulted on the images in the comic being considered uncopyrightable. The broader comic, in the organization of the images, the plot and dialogue, etc., still enjoys copyright protection. But in the US, I could just directly take the images in the comic that Midjourney produced and use them for another purpose without violating copyright.