Updates to Consumer Terms and Privacy Policy

1. I_am_tiberius ◴[29 Aug 25 12:01 UTC] No.45062905[source]▶

In my opinion, training models on user data without their real consent (real consent = e.g. the user must sign a contract or so, so he's definitely aware), should be considered a serious criminal offense.

replies(5): >>45062989 #>>45063008 #>>45063221 #>>45063771 #>>45064402 #

2. jsheard ◴[29 Aug 25 12:08 UTC] No.45062989[source]▶

>>45062905 (TP) #

Why single out user data specifically? Most of the data Anthropic and co train on was just scooped up from wherever with zero consent, not even the courtesy of a buried TOS clause, and their users were always implicitly fine with that. Forgive me for not having much sympathy when the users end up reaping what they've sown.

replies(3): >>45063012 #>>45063051 #>>45063335 #

3. Rygian ◴[29 Aug 25 12:10 UTC] No.45063008[source]▶

>>45062905 (TP) #

It already is. See Art. 5. 1.(b) here: https://gdpr-info.eu/art-5-gdpr/

replies(2): >>45063025 #>>45063040 #

4. I_am_tiberius ◴[29 Aug 25 12:11 UTC] No.45063012[source]▶

>>45062989 #

100 % true.

5. nicce ◴[29 Aug 25 12:12 UTC] No.45063025[source]▶

>>45063008 #

Is ”Accept” in cookie box good enough contract?

replies(1): >>45063043 #

6. I_am_tiberius ◴[29 Aug 25 12:13 UTC] No.45063040[source]▶

>>45063008 #

I believe that only concerns European users. Moreover, I believe a simple press of an OK button is fine with GDPR. This data (type and volume) however, is way more serious and can't be agreed on by just pressing a button.

7. I_am_tiberius ◴[29 Aug 25 12:13 UTC] No.45063043{3}[source]▶

>>45063025 #

8. perihelions ◴[29 Aug 25 12:14 UTC] No.45063051[source]▶

>>45062989 #

Training on private user interactions is a privacy violation; training on public, published texts is (some argue) an intellectual property violation. They're very different kinds of moral rights.

replies(2): >>45063481 #>>45064161 #

9. happosai ◴[29 Aug 25 12:30 UTC] No.45063221[source]▶

>>45062905 (TP) #

I think it's cute people believe companies that trained their models with every single book and online page ever written without consents from authors (and often against the explicit request of the author without any opt-out) won't do a rugg-pull and do it also to all the chats they have aquired...

replies(2): >>45063234 #>>45064517 #

10. fHr ◴[29 Aug 25 12:32 UTC] No.45063234[source]▶

>>45063221 #

Yeah people are gullible these days. We need another full 2008 crash that hurts bad before people wake up for a bit before becomming like this again.

replies(2): >>45063342 #>>45063498 #

11. __MatrixMan__ ◴[29 Aug 25 12:41 UTC] No.45063335[source]▶

>>45062989 #

Publishing something is considered by most to be sufficient consent for it to be not considered private.

I realize there's a whole legal quagmire here involved with intellectual "property" and what counts as "derivative work", but that's a whole separate (and dubiously useful) part of the law.

replies(1): >>45063793 #

12. FergusArgyll ◴[29 Aug 25 12:41 UTC] No.45063342{3}[source]▶

>>45063234 #

Or we can root for happiness and prosperity instead

replies(1): >>45063392 #

13. bigfishrunning ◴[29 Aug 25 12:47 UTC] No.45063392{4}[source]▶

>>45063342 #

I "root for people not burglarizing my house", but i put locks on my doors also. The way the market for these tools is behaving, a crash is extremely likely; batten down the hatches.

replies(1): >>45063678 #

14. diggan ◴[29 Aug 25 12:54 UTC] No.45063481{3}[source]▶

>>45063051 #

Have Anthropic ever written clearly exactly about what training datasets they use? Like a list of everything included? AFAIK, all the providers/labs are kind of tightly lipped about this, so I think it's safe to assume they've slurped up all data they've come across via multiple methodologies, "private" or not.

replies(2): >>45063659 #>>45063756 #

15. DrillShopper ◴[29 Aug 25 12:56 UTC] No.45063498{3}[source]▶

>>45063234 #

Hurts whom that bad?

AI companies will get bailed out like the auto industry was - they won't be hurt at all.

16. ◴[29 Aug 25 13:11 UTC] No.45063659{4}[source]▶

>>45063481 #

17. FergusArgyll ◴[29 Aug 25 13:13 UTC] No.45063678{5}[source]▶

>>45063392 #

> We need another full 2008 crash that hurts bad

18. dmbche ◴[29 Aug 25 13:19 UTC] No.45063756{4}[source]▶

>>45063481 #

Look at the suits against them they list it there

replies(1): >>45063910 #

19. Aurornis ◴[29 Aug 25 13:20 UTC] No.45063771[source]▶

>>45062905 (TP) #

From the actual source ( https://www.anthropic.com/news/updates-to-our-consumer-terms ) they’re going to show a pop-up with the terms change. I triggered it now by going to the Privacy settings page and reviewing the new terms.

It’s quite clear. It’s easy to opt out. They’re making everyone go through it.

It doesn’t reach your threshold of having everyone sign a contract or something, but then again no other online service makes people sign contracts.

> should be considered a serious criminal offense.

On what grounds? They’re showing people the terms. It’s clear enough. People have to accept the terms. We’ve all been accepting terms for software and signing up for things online for decades.

replies(1): >>45063888 #

20. chamomeal ◴[29 Aug 25 13:23 UTC] No.45063793{3}[source]▶

>>45063335 #

That is definitely normally true but I feel like the scale and LLM usage turns it into a different problem.

If you can use all of the content of stack overflow to create a “derivative work” that replaces stack overflow, and causes it to lose tons of revenue, is it really a derivative work?

I’m pretty sure solution sites like chegg don’t include the actual questions for that reason. The solutions to the questions are derivative, but the questions aren’t.

replies(2): >>45063899 #>>45064495 #

21. airstrike ◴[29 Aug 25 13:31 UTC] No.45063888[source]▶

>>45063771 #

People have T&C and cookie popup fatigue. I almost hit "accept" before noticing the opt out toggle, thinking it was a simple T&C update. This is definitely a fucked up way to set it up, there's no sugar coating it.

22. airstrike ◴[29 Aug 25 13:32 UTC] No.45063899{4}[source]▶

>>45063793 #

Replacing stack overflow has no bearing on the definition of "derivative"

23. diggan ◴[29 Aug 25 13:33 UTC] No.45063910{5}[source]▶

>>45063756 #

Are there complete lists in the suits? Last time I skimmed them, they contained allegations of sources, and some admissions like The Pile, LibGen, Books3, PiLiMi, scanned books, web scrapes and some other sources I don't remember, but AFAIK there isn't any complete inventory of training datasets they used.

24. jsheard ◴[29 Aug 25 13:52 UTC] No.45064161{3}[source]▶

>>45063051 #

I wish I could be so optimistic that there is no private information published unintentionally or maliciously on the open web where crawlers can find it.

(and as diggan said, the web isn't the only source they use anyway. who knows what they're buying from data brokers.)

25. zajio1am ◴[29 Aug 25 14:12 UTC] No.45064402[source]▶

>>45062905 (TP) #

Why? This is not 'use collected information to targed ads', or 'sell collected information to third parties', but 'use collected information from the service to improve the service'. Does not really seems to me much different than ISPs using traffic stats to plan infrastructure improvements, or a website using access logs to improve accessibility and navigation.

And when talking specifically about AI, one could argue that learning from interactions is a common aspect of intelligence, so a casual user who do not understand details about LLMs would expect so anyways. Also, the fact that LLMs (and other neural networks) have distinct training and inference phases seems more like an implementation detail.

26. __MatrixMan__ ◴[29 Aug 25 14:20 UTC] No.45064495{4}[source]▶

>>45063793 #

Stack overflow doesn't really have a legitimate claim to that data either though. Nor do the users, we're just pasting error messages and documentation. It's derivative all the way down. It'll never sit still and behave like property.

Privacy makes sense, treating data like property does not.

replies(1): >>45065641 #

27. SoftTalker ◴[29 Aug 25 14:22 UTC] No.45064517[source]▶

>>45063221 #

You're absolutely right, but also isn't the volume of new data they are getting from chats tiny compared to what they've already trained on? I'm wondering how much difference it will really make.

28. chamomeal ◴[29 Aug 25 15:49 UTC] No.45065641{5}[source]▶

>>45064495 #

Point taken, but it still feels like a gray area to me. The value that SO created was the curation of knowledge and high quality discussions that were well indexed and searchable.

The users did provide the data, which is a good point. But there’s a reason SO was so useful to developers and quora was not. It also made it a perfect feeding ground for hungry LLMs.

Then again I’m just guessing that big models are trained on SO. Maybe that’s not true