Most active commenters
  • shadowgovt(8)
  • snickerdoodle12(4)
  • marcus_holmes(4)
  • perching_aix(3)

←back to thread

209 points alexcos | 33 comments | | HN request time: 2.589s | source | bottom
1. okdood64 ◴[] No.44414264[source]
Does YouTube allow massive scraping like this in their ToS?
replies(6): >>44414289 #>>44414294 #>>44414316 #>>44414339 #>>44414427 #>>44419304 #
2. dangoodmanUT ◴[] No.44414289[source]
What ToS
replies(1): >>44414359 #
3. mouse_ ◴[] No.44414294[source]
Probably not.

Who cares at this point? No one is stopping ML sets from being primarily pirated. The current power is effectively dismantling copyright for AI related work.

replies(2): >>44414357 #>>44414443 #
4. MaxPock ◴[] No.44414316[source]
They don't and neither do I allow my site - whose content I found on Gemini -scraped
5. klysm ◴[] No.44414339[source]
I don't think they can legally prevent it
6. perching_aix ◴[] No.44414357[source]
> The current power is effectively dismantling copyright for AI related work.

Out of the loop apparently, could you elaborate? By "the current power" I take you mean the current US administration?

replies(2): >>44414505 #>>44414689 #
7. bobmcnamara ◴[] No.44414359[source]
https://www.youtube.com/static?template=terms ?
8. perching_aix ◴[] No.44414427[source]
My "lawyer" (gpt4o) claims that since YouTube is merely a non-exclusive licensee of the user content upload to their service, even if they have such restrictions in their ToS (they do), they likely would not hold up in court, citing [0]. Something about that non-exclusivity meaning they cannot constrain the copyright further on their own terms. Which I guess makes sense?

And since scraping of publicly available data is not illegal (in the US, according to the aforementioned "lawyer"), it seems like it's okay?

Not legal advice.

[0] https://www.skadden.com/insights/publications/2024/05/distri...

9. snickerdoodle12 ◴[] No.44414443[source]
> Who cares at this point

Anyone who has a shred of integrity. I'm not a fan of overreaching copyright laws, but they've been strictly enforced for years now. Decades, even. They've ruined many lives, like how they killed Aaron Swartz.

But now, suddenly, violating copyright is totally okay and carries no consequences whatsoever because the billionaires decided that's how they can get richer now?

If you want to even try to pretend you don't live in a plutocracy and that the rule of law matters at all these developments should concern you.

replies(3): >>44418197 #>>44418243 #>>44418598 #
10. bgwalter ◴[] No.44414505{3}[source]
Trump fired the head of the copyright office:

https://www.heise.de/en/news/After-criticism-of-AI-training-...

The "Big Beautiful Bill" contains a clause that prohibits state "AI" legislation.

Trump has a "Crypto and AI czar" who is very active in promoting "AI" on his YouTube propaganda outlet. The same czar also promoted, pre-election of course, accelerated peace with Russia and then stopped talking about the subject altogether.

replies(1): >>44414545 #
11. perching_aix ◴[] No.44414545{4}[source]
Oh wow okay, genuinely missed these. Thanks.
12. ◴[] No.44414689{3}[source]
13. shadowgovt ◴[] No.44418197{3}[source]
Aaron Swartz died of suicide, not copyright.

His death was a tragedy but it wasn't done to him.

replies(2): >>44418691 #>>44425663 #
14. mouse_ ◴[] No.44418243{3}[source]
> If you want to even try to pretend you don't live in a plutocracy and that the rule of law matters at all

Can't even pretend anymore, this season jumped the shark

15. jagged-chisel ◴[] No.44418598{3}[source]
> … like how they killed Aaron Swartz.

I can’t imagine why you’d let the FBI off the hook

16. marcus_holmes ◴[] No.44418691{4}[source]
There's an English phrase "hounded to death", meaning that someone was pursued and hassled until they died. It doesn't specify the cause of death, but I think the assumption would be suicide, since you can't actually die of fatigue.

I think that's what was done to Aaron Swartz.

replies(1): >>44418934 #
17. shadowgovt ◴[] No.44418934{5}[source]
Many people have dealt with the law, with copyright infringement, even with gross amounts of it, and had the book thrown at them, and survived the experience.

Swartz was ill. It is a tragedy he did not survive the experience, and indeed, trial is very stressful. But he was no more hounded than any defendant who comes under federal scrutiny and has to defend themselves in a court of law via the trial system. Kevin Mitnick spent a year in prison (first incarceration) and survived it. Swartz was offered six months and committed suicide.

I don't know how much we should change of the system to protect the Aaron Swartzs of the world; that's the mother of all Chesterton's Fences.

replies(2): >>44419913 #>>44420038 #
18. nerdsniper ◴[] No.44419304[source]
Per HiQ vs. LinkedIn, it doesn't matter what their ToS says if the scraper didn't have to agree to the ToS to scrape the data. YouTube will serve videos to someone who isn't logged in. So if you've never agreed to YouTube's ToS, you can scrape the videos. If YT forced everyone to log in before they could watch a video, then anyone who wants to scrape videos would have had to agree to the ToS at some point.
replies(1): >>44426955 #
19. shiroiuma ◴[] No.44419913{6}[source]
Maybe someone should throw you in prison for a year on some BS made-up charges to see how well you survive it. We can use it as a data point for your argument.
20. marcus_holmes ◴[] No.44420038{6}[source]
Many people get (for example) pneumonia and recover. Some people get pneumonia and die. The people who died of pneumonia died because of pneumonia. The fact that other people survived it doesn't mean that they didn't die of it.

Saying that we should not work on cures for pneumonia because it's a Chesterton Fence is obviously, blatantly, illogical. Saying that we should change the system so that government officials working for moneyed interests can't hound someone to death is similarly illogical.

replies(1): >>44422005 #
21. shadowgovt ◴[] No.44422005{7}[source]
Pneumonia doesn't have any societal benefit. The process by which we decide if the law was broken and punishment necessary has obvious benefit. If you mean we should seek a cure for dangerous suicidal depression, I agree. But you surely are not suggesting that, for example, has Swartz been accused of embezzlement that the state drop out finish the charges purely because he's a suicide risk; how would that be just to the people who were stolen from?

And it's a point of semantics, but no; we generally don't say people who died by suicide died by the things going on in their life when they ended it. Everybody has stressors. The suicidal also have mental illness. Mr. Swartz had self-documented his past suicidal ideation.

replies(1): >>44439314 #
22. snickerdoodle12 ◴[] No.44425663{4}[source]
Crimes generally don't kill the criminal. It's the reaction by authorities that kills (perceived) criminals.
replies(1): >>44427095 #
23. olyjohn ◴[] No.44426955[source]
It won't serve me videos if I'm not logged in. It tells me to sign in to prove I'm not a bot. How do these people get around this?
replies(1): >>44427270 #
24. shadowgovt ◴[] No.44427095{5}[source]
This is true. In general, the harm done by crime is directed outwards from the perpetrator, not inwards to the perpetrator. In fact, the behaviors that only cause self-harm that we criminalize are relatively few.
replies(1): >>44434267 #
25. nerdsniper ◴[] No.44427270{3}[source]
It does for me, in USA on AT&T Fiber using Safari in private browsing mode. Chrome in incognito as well. And phone on mobile YouTube (though I didn't test with uninstalling/reinstalling to reset IDFA and IDFV, so it's not really a valid test)
26. snickerdoodle12 ◴[] No.44434267{6}[source]
So however you want to twist it, he was killed by the government.
replies(1): >>44436904 #
27. shadowgovt ◴[] No.44436904{7}[source]
Sorry, I don't see how you arrive there from the fact-pattern. He wasn't a criminal because he never had a trial. He killed himself before he even had a hearing on whether the prosecution's evidence was admissible, much less his opportunity to either prove his innocence or argue the acts he undertook shouldn't by rights be a crime at all.

What should the government (executive or judicial) have done differently to balance the needs of the accused vs. the needs of the enforcement and adjudication of the law here?

replies(1): >>44437446 #
28. snickerdoodle12 ◴[] No.44437446{8}[source]
The government killed him by threatening insane punishment for something that is practically harmless, and relevant to the original point, is done without a second thought now by the bigcorps to feed their AIs
replies(1): >>44437769 #
29. shadowgovt ◴[] No.44437769{9}[source]
Prosecutors do that all the time. Basically nobody dies of it. I'd humbly propose there were unfortunate mitigating circumstances in Mr Swartz's situation that made it unusual. When a person with AIDS dies, did the AIDS kill them or the pneumonia a regular body would have fought off? When a person with deep mental illness commits suicide, did the circumstances of their life kill them or did they succumb to a deep mental illness?

Perhaps we could craft a way to hold people with mental health issue to the same standards we are all held to while simultaneously being more sensitive to their needs. But in general, his story is an unfortunate tragedy of a sick person who took their own life under a stress that doesn't kill most other people, and we adjust the way we prosecute crime at our own peril. It is, as I said elsewhere, the mother of all Chesterton's Fences. Which is not to say it cannot or should not be improved! Only that it be done with great care.

And to be completely clear: Swartz ripped content via back-dooring a secured network physically, in a closet, and (it is alleged) planned to dump that content in public. We'll never really know since he (or his illness) denied himself his day in court, and that's a tragedy; he may have successfully defended himself, or could have been a living example of persevering anyway like Mitnick instead of a martyr. Companies using their authorized accounts to scrape Google are likely at most guilty of a TOS violation and Google may choose to cut their accounts, but it's very hard to make a case that the Google API saying, over and over again, "Yes you may view that video" constitutes either unauthorized access or exceeding the bounds of access under 9-48.000.

It's hard to comment on whether Swartz violated the CFAA. Since he wasn't tried, we'll never really know. He exited life before justice could happen one way or the other.

30. marcus_holmes ◴[] No.44439314{8}[source]
We evolved with pneumonia for some reason. It could easily be a Chesterton Fence. We don't treat this as one because we don't want people to die of it.

I agree that a system of laws has benefit to society. However the system we've worked out for making such laws is clearly being warped and twisted to serve one small section of society at the expense of everyone else.

A clear case being the comment that started this conversation - Swartz was hounded to death for doing the exact same thing that AI companies are doing and they're facing zero punishment. AI executives are not being dragged from their offices by burly policemen and thrown into cells, yet they have done the exact same thing that Swartz did to merit that behaviour. It's not unreasonable to question the societal benefit of this system.

And we totally should say that people died of depression, or financial stress, or legal persecution, or whatever. Most people have suicidal ideation at some point in their lives, that's not unusual. Being hassled to the point where you go through with it is definitely violence. Classing this as "mental illness" and therefore a personality defect is a form of victim blaming.

replies(1): >>44439729 #
31. shadowgovt ◴[] No.44439729{9}[source]
> We evolved with pneumonia for some reason. It could easily be a Chesterton Fence.

It's not, and I don't think you're seriously arguing this point so I'm going to ignore it.

It is, I think, a reasonable observation that had Swartz formed an LLC to pursue advanced analysis of academic papers for, I don't know, trends in the language used in research and slurped bunch of JSTOR for that purpose, the trial would have taken longer and involved more lawyers. That's probably an observation that should give us pause. Or not, because nobody argued that's what he did or that was his intent, including him. So I also think the premise of comparison to the current circumstances is flawed; I don't think the CFAA can be applied in a context where people have access rights and go through Google's front door to scan videos for the purpose of training a machine learning algorithm. It might be a TOS violation. It's not hiding a server in a closet with unauthorized physical access, which is what Swartz was accused of.

Intent matters, and, sadly, we never got to the trial where intent could have been proven out.

> Being hassled to the point where you go through with it is definitely violence.

The government does have the monopoly on violence. But I think what happened to Swartz is a far cry from that, as he never got to sentencing, much less trial. There was some light compulsion (requirement to appear in court), of course. But everyone who's ever wanted to contest a parking ticket has to experience that. Sadly, this train of thought goes into a station of "Swartz should have been under professional care if his condition was this much a danger to him," and I don't know how the government should change its behavior if he wasn't. Prosecutors are not prognosticators of the mental health of defendants, and I've never read anywhere that Swartz wanted to be committed for mental illness.

Our system is much harder for defendants grappling with mental illness; I'll acknowledge and argue for change regarding that. I don't know that such change would conclude with "Swartz should never have been accused of committing a crime that a lot of evidence suggests he committed," however.

replies(1): >>44440421 #
32. marcus_holmes ◴[] No.44440421{10}[source]
All good points, thanks for the constructive reply.

Your point that Swartz would have had a different result had he formed an LLC, and hired a bunch of lawyers, is definitely the key point here. A legal system that only works for the rich and powerful is not something we should defend, support, or put up with.

His purpose in copying research papers and making them available for free is massively more in the public interest than anything the AI companies are doing. They are, after all, seeking to make a profit at the end of this. And they knowingly and deliberately broke copyright law because it was "too hard" to make any kind of licensing deal with the publishers. You can argue about fair use and transformative purposes (as their lawyers have done), but you can also argue from Swartz's point of view that this information was (to a large extent) publicly funded and therefore belonged to the public, and trying to get the journals to acknowledge that is "too hard". And had he been able to afford lawyers, that's a possible line they could have taken. But he didn't get the chance. As you say, we never got to the trial so we will never know.

It's definitely not a stretch to say that his crime and the AI companies' crimes (which they admit to - they admit to downloading source texts from pirate sites) are comparable, even equivalent. Yet their treatment is not.

My understanding of his treatment is that it was a lot more than "light compulsion" and that he underwent a sustained campaign of enforcement activity and litigation at the hands of a specific prosecutor. But given that the AI companies have had nothing - no criminal charges - just a civil case brought by the authors they admit to ripping off, then I don't think I need to push this point. They are clearly being treated differently to him, despite the similar actions.

replies(1): >>44443113 #
33. shadowgovt ◴[] No.44443113{11}[source]
We haven't gotten to the part of the trial for Anthropic yet where we determine whether they actually broke the law when they downloaded from pirate sites. Copyright has multiple exceptions. And on the topic at hand here (training on YouTube videos to understand space and relationships in it), I don't think even Google would want to make the case that it's a violation of copyright.

That's the thing about copyright; it's a whole category of law more based in utility than morality. One of the reasons AI is such a fight right now is that nobody was opposing it as an academic project when it was generating, for example, tools that could go from an image to describing the image, or from an image to recognizing the likely artistic style and helping somebody find the original artist. But with just a few tweaks those tools became devices for generating novel images, and now people are upset. Intent matters.

And again, you are drawing equivalence between harvesting data from openly accessible sources online and hiding a server in a closet with unauthorized physical access to a network. Swartz's prosecution wasn't accusing him of copyright violation; it was accusing him of compromising a network. A far more serious charge; if the researchers in the story here had collected those YouTube videos by wiretapping the fiber optics between two of Google's data centers I suspect they would have concerns.