This feels like an unwarranted anthropomorphization of what LLMs are doing.
This feels like an unwarranted anthropomorphization of what LLMs are doing.
Misanthropic has convinced this particular judge, but there are many others, especially in other countries.
That is, I don't think anyone (especially on this website) would have a problem if someone read a ton of books, and them opened a website where you can chat with them and ask them questions about the books. But if this person had "super abilities", where they could read every book that ever existed, then respond almost instantly to questions about any book that was read, and the person could respond to millions of questions simultaneously, I think that "fair use" as it exists now would have never existed - it completely breaks the economic model that copyright was supposed to incentivize in the first place. I'm not arguing which position is right or wrong, but I am arguing that using "if a human did it it would be fair use" is a very bad analogy.
As a similar example, in the US, courts had regularly held that people walking around outside don't have an expectation of privacy. But what if computers could then record you, upload you to a website, and use facial recognition so that anyone else in the world could set an alert to be notified if you ever appeared on some certain camera. The original logic that fed into the "no expectations of privacy when in public" rulings breaks down solely due to the speed and scale with which computers can operate.
I don't see why it would be different for LLMs.
If a website gets organically DOSed by Slashdot, that is not an illegal attack.
LLMs 'reading' a book is not the same as a human reading a book in the exact same way that following a very popular link is not participating in a DDOS
The issue is the recall LLMs have over copyrighted contents.
Personally, my read is that the issue with most of these cases is that we are treating and talking about LLMs as if they do things that humans do. They don't. They don't reason. They don't think. They don't know. They just map input to probabilistic output. LLMs are a tool like any other for more easily achieving some outcome.
It's precisely because we insist on treating LLMs as if they are more than an inefficient storage device (with a neat/useful trick) that we run into questions like this. I personally think the illegal status of current models should be pretty clear simply based of the pirated nature of their input material. To my understanding, fair use has never before applied to works that were obtained illegally.