Show HN: Llama-dl – high-speed download of LLaMA, Facebook's 65B GPT model

As far as my understanding of American copyright goes, a computer produced work cannot be copyrighted as computers are not human, in the same way a photograph taken by a chimp cannot be copyrighted no matter who owned the camera that took the photo. This is one of the major challenges with the legal status of AI as well that will soon be fought over in court.

It's possible that the automated processing of the dataset is considered to be non-creative enough that the generated AI model cannot be copyrighted. The code to train the model and the input dataset (and the works therein) definitely can be, but not the model itself.

In that case, Facebook would be out of luck, as long as the code to train the model isn't shared. If the courts find AI models to be a different type of work that does produce copyrightable models, Facebook may follow in the footsteps of other copyright giants and start filing lawsuits against anyone who they can catch. I very much doubt they'd go so far, especially since by the time they can even start a lawsuit confidently, the leaked model is probably already outdated and irrelevant.

Personally, I expect the model to end up being uncopyrightable, as would be the output of the model.

This may or may not have very interesting results. The dataset itself is probably copyrightable (a human or set of humans composed it, unless that was also done completely automatically) but if that copyright is claimed, the individual right holders of the included works may demand a licensing fee similar to how sound bytes work in music; "you want to use my work, pay me a fee".

Or maybe the dataset is considered to be diverse enough that individual works cannot be expected to be compensated for their inclusion and you can get around copyright law by amassing enough content at once, who knows.