Who cares at this point? No one is stopping ML sets from being primarily pirated. The current power is effectively dismantling copyright for AI related work.
Anyone who has a shred of integrity. I'm not a fan of overreaching copyright laws, but they've been strictly enforced for years now. Decades, even. They've ruined many lives, like how they killed Aaron Swartz.
But now, suddenly, violating copyright is totally okay and carries no consequences whatsoever because the billionaires decided that's how they can get richer now?
If you want to even try to pretend you don't live in a plutocracy and that the rule of law matters at all these developments should concern you.
His death was a tragedy but it wasn't done to him.
I think that's what was done to Aaron Swartz.
Swartz was ill. It is a tragedy he did not survive the experience, and indeed, trial is very stressful. But he was no more hounded than any defendant who comes under federal scrutiny and has to defend themselves in a court of law via the trial system. Kevin Mitnick spent a year in prison (first incarceration) and survived it. Swartz was offered six months and committed suicide.
I don't know how much we should change of the system to protect the Aaron Swartzs of the world; that's the mother of all Chesterton's Fences.
Saying that we should not work on cures for pneumonia because it's a Chesterton Fence is obviously, blatantly, illogical. Saying that we should change the system so that government officials working for moneyed interests can't hound someone to death is similarly illogical.
And it's a point of semantics, but no; we generally don't say people who died by suicide died by the things going on in their life when they ended it. Everybody has stressors. The suicidal also have mental illness. Mr. Swartz had self-documented his past suicidal ideation.
What should the government (executive or judicial) have done differently to balance the needs of the accused vs. the needs of the enforcement and adjudication of the law here?
Perhaps we could craft a way to hold people with mental health issue to the same standards we are all held to while simultaneously being more sensitive to their needs. But in general, his story is an unfortunate tragedy of a sick person who took their own life under a stress that doesn't kill most other people, and we adjust the way we prosecute crime at our own peril. It is, as I said elsewhere, the mother of all Chesterton's Fences. Which is not to say it cannot or should not be improved! Only that it be done with great care.
And to be completely clear: Swartz ripped content via back-dooring a secured network physically, in a closet, and (it is alleged) planned to dump that content in public. We'll never really know since he (or his illness) denied himself his day in court, and that's a tragedy; he may have successfully defended himself, or could have been a living example of persevering anyway like Mitnick instead of a martyr. Companies using their authorized accounts to scrape Google are likely at most guilty of a TOS violation and Google may choose to cut their accounts, but it's very hard to make a case that the Google API saying, over and over again, "Yes you may view that video" constitutes either unauthorized access or exceeding the bounds of access under 9-48.000.
It's hard to comment on whether Swartz violated the CFAA. Since he wasn't tried, we'll never really know. He exited life before justice could happen one way or the other.
I agree that a system of laws has benefit to society. However the system we've worked out for making such laws is clearly being warped and twisted to serve one small section of society at the expense of everyone else.
A clear case being the comment that started this conversation - Swartz was hounded to death for doing the exact same thing that AI companies are doing and they're facing zero punishment. AI executives are not being dragged from their offices by burly policemen and thrown into cells, yet they have done the exact same thing that Swartz did to merit that behaviour. It's not unreasonable to question the societal benefit of this system.
And we totally should say that people died of depression, or financial stress, or legal persecution, or whatever. Most people have suicidal ideation at some point in their lives, that's not unusual. Being hassled to the point where you go through with it is definitely violence. Classing this as "mental illness" and therefore a personality defect is a form of victim blaming.
It's not, and I don't think you're seriously arguing this point so I'm going to ignore it.
It is, I think, a reasonable observation that had Swartz formed an LLC to pursue advanced analysis of academic papers for, I don't know, trends in the language used in research and slurped bunch of JSTOR for that purpose, the trial would have taken longer and involved more lawyers. That's probably an observation that should give us pause. Or not, because nobody argued that's what he did or that was his intent, including him. So I also think the premise of comparison to the current circumstances is flawed; I don't think the CFAA can be applied in a context where people have access rights and go through Google's front door to scan videos for the purpose of training a machine learning algorithm. It might be a TOS violation. It's not hiding a server in a closet with unauthorized physical access, which is what Swartz was accused of.
Intent matters, and, sadly, we never got to the trial where intent could have been proven out.
> Being hassled to the point where you go through with it is definitely violence.
The government does have the monopoly on violence. But I think what happened to Swartz is a far cry from that, as he never got to sentencing, much less trial. There was some light compulsion (requirement to appear in court), of course. But everyone who's ever wanted to contest a parking ticket has to experience that. Sadly, this train of thought goes into a station of "Swartz should have been under professional care if his condition was this much a danger to him," and I don't know how the government should change its behavior if he wasn't. Prosecutors are not prognosticators of the mental health of defendants, and I've never read anywhere that Swartz wanted to be committed for mental illness.
Our system is much harder for defendants grappling with mental illness; I'll acknowledge and argue for change regarding that. I don't know that such change would conclude with "Swartz should never have been accused of committing a crime that a lot of evidence suggests he committed," however.
Your point that Swartz would have had a different result had he formed an LLC, and hired a bunch of lawyers, is definitely the key point here. A legal system that only works for the rich and powerful is not something we should defend, support, or put up with.
His purpose in copying research papers and making them available for free is massively more in the public interest than anything the AI companies are doing. They are, after all, seeking to make a profit at the end of this. And they knowingly and deliberately broke copyright law because it was "too hard" to make any kind of licensing deal with the publishers. You can argue about fair use and transformative purposes (as their lawyers have done), but you can also argue from Swartz's point of view that this information was (to a large extent) publicly funded and therefore belonged to the public, and trying to get the journals to acknowledge that is "too hard". And had he been able to afford lawyers, that's a possible line they could have taken. But he didn't get the chance. As you say, we never got to the trial so we will never know.
It's definitely not a stretch to say that his crime and the AI companies' crimes (which they admit to - they admit to downloading source texts from pirate sites) are comparable, even equivalent. Yet their treatment is not.
My understanding of his treatment is that it was a lot more than "light compulsion" and that he underwent a sustained campaign of enforcement activity and litigation at the hands of a specific prosecutor. But given that the AI companies have had nothing - no criminal charges - just a civil case brought by the authors they admit to ripping off, then I don't think I need to push this point. They are clearly being treated differently to him, despite the similar actions.
That's the thing about copyright; it's a whole category of law more based in utility than morality. One of the reasons AI is such a fight right now is that nobody was opposing it as an academic project when it was generating, for example, tools that could go from an image to describing the image, or from an image to recognizing the likely artistic style and helping somebody find the original artist. But with just a few tweaks those tools became devices for generating novel images, and now people are upset. Intent matters.
And again, you are drawing equivalence between harvesting data from openly accessible sources online and hiding a server in a closet with unauthorized physical access to a network. Swartz's prosecution wasn't accusing him of copyright violation; it was accusing him of compromising a network. A far more serious charge; if the researchers in the story here had collected those YouTube videos by wiretapping the fiber optics between two of Google's data centers I suspect they would have concerns.