←back to thread

361 points mseri | 1 comments | | HN request time: 0.208s | source
1. comex ◴[] No.46008784[source]
Note that while the authors themselves purport to release the training data under a permissive license, it includes scraped webpages, with the only rule being "don’t collect from sites that explicitly disallow it, including paywalled content". So the original text is mostly not freely licensed by its authors.

However, the use of this text for training might be transformative enough to constitute fair use, in which case a license from the authors would be unnecessary. For now this is an unsettled legal question, but it's not going to stay unsettled for long, at least not in the US. In fact, we've already seen two judges address the question in summary judgement rulings and reach roughly opposite conclusions [1]. One of those cases has since been settled, but inevitably, some of the many ongoing AI copyright cases will make their way to appeals courts, and probably the Supreme Court.

In the long run, I suspect that this will be allowed one way or another. Either courts will make a finding of fair use, or Congress will step in and create some kind of copyright carveout. Both have their limitations: court rulings tend to draw fuzzy lines around what conduct is allowed and what isn't, while legislation draws sharp lines that tend to be too sharp (with random restrictions and carveouts based on negotiations).

If so, what happens next? Some free software purists will never accept this type of use, and they'd have reasonable grounds for not doing so (legal uncertainty in the rest of the world, or moral/ethical grounds). But I think it would be a mistake for the free-software world broadly to reject it. This type of model is as open as is physically possible, and represents a real improvement in user agency compared to mere open-weights models, let alone compared to the closed models that seem to be getting increasingly dominant.

Anyway, we'll see.

[1] https://www.skadden.com/insights/publications/2025/07/fair-u...