If you have a Claude account, they're going to train on your data moving forward

1. AlecSchueler ◴[29 Aug 25 12:01 UTC] No.45062904[source]▶

Am I the only one that assumed everything was already being used for training?

replies(9): >>45062929 #>>45063168 #>>45063951 #>>45064966 #>>45065323 #>>45065428 #>>45065912 #>>45066950 #>>45070135 #

2. lemonberry ◴[29 Aug 25 12:03 UTC] No.45062929[source]▶

>>45062904 (TP) #

You are not.

3. A4ET8a8uTh0_v2 ◴[29 Aug 25 12:26 UTC] No.45063168[source]▶

>>45062904 (TP) #

I mean, I am sure there are individuals, who still believe in the basic value of the word within the framework of our civilization, but, having seen those words not just twisted beyond recognition to fit a specific idea, but simply ignored when they were no longer convenient, it would be a surprise now that a cynical stance is not more common.

The question is: how does that affect their choices. How much ends up being gated what previously would have ended up in the open?

Me: I am using a local variant ( and attempting to build something I think I can control better ).

4. hexage1814 ◴[29 Aug 25 13:36 UTC] No.45063951[source]▶

>>45062904 (TP) #

This. It's the same innocence of people who believe when you delete a document on Google/META/Apple/Microsoft servers, it "really" gets deleted. Google most likely has a backup of every piece of information indexed by them in the last 20 years or so. It would cause envy to the Internet Archive.

replies(2): >>45064152 #>>45064261 #

5. giancarlostoro ◴[29 Aug 25 13:51 UTC] No.45064152[source]▶

>>45063951 #

With the privacy laws out there, I do genuinely think they eventually get purged even from backups. I remember there being a really cool YouTube video shared here on HN that google no longer has publicly, it was about the process of an email and all the behind the scenes things, like physical security into a data center, to their patented hard drive shredders they use once the hard drives are to be tossed. I wish Google had kept that video public and online, it was a great watch.

I know once you delete something on Discord its poof, and that's the end of that. I've reported things that if anyone at Discord could access a copy of they would have called police. There's a lot of awful trolls on chat platforms that post awful things.

replies(2): >>45064268 #>>45064818 #

6. bwillard ◴[29 Aug 25 14:01 UTC] No.45064261[source]▶

>>45063951 #

Officially, up to you if you believe they are following their policies, all of the companies have published statements on how long they keep their data after deletion (which customers broadly want to support recovery if something goes wrong).

- Google: active storage for "around 2 months from the time of deletion" and in backups "for up to 6 months": https://policies.google.com/technologies/retention?hl=en-US

- Meta: 90 days: https://www.meta.com/help/quest/609965707113909/

- Apple/iCloud: 30 days: https://support.apple.com/guide/icloud/delete-files-mm3b7fcd...

- Microsoft: 30-180 days: https://learn.microsoft.com/en-us/compliance/assurance/assur...

So if it ends up that they are storing data longer there can be consequences (GDPR, CCPA, FTC).

replies(1): >>45065287 #

7. diggan ◴[29 Aug 25 14:01 UTC] No.45064268{3}[source]▶

>>45064152 #

> I know once you delete something on Discord its poof, and that's the end of that. I've reported things that if anyone at Discord could access a copy of they would have called police. There's a lot of awful trolls on chat platforms that post awful things.

That's not what Discord themselves say, is that coming from Discord, the police or someone else?

> Once you delete content, it will no longer be available to other users (though it may take some time to clear cached uploads). Deleted content will also be deleted from Discord’s systems, but we may retain content longer if we have a legal obligation to preserve it as described below. Public posts may also be retained for 180 days to two years for use by Discord as described in our Privacy Policy (for example, to help us train models that proactively detect content that violates our policies). - https://support.discord.com/hc/en-us/articles/5431812448791-...

Seems to be something that decides if the content should be deleted faster, or kept for between 180 days - 2 years. So even for Discord, "once you delete something on Discord its poof" isn't 100% accurate.

replies(1): >>45064452 #

8. giancarlostoro ◴[29 Aug 25 14:16 UTC] No.45064452{4}[source]▶

>>45064268 #

At least in terms of reporting content to "Trust and Safety" they certainly behave like its gone forever. I have had friends report illegal content, to both Discord and law enforcement, the take away seemed like it was gone, now it's making me wonder if Discord is really archiving CSAM material for two years and not helping law enforcement unless a proper warrant is involved, yikes.

replies(1): >>45065593 #

9. conradev ◴[29 Aug 25 14:44 UTC] No.45064818{3}[source]▶

>>45064152 #

My understanding is that for Gmail specifically, they keep a record of every email ever received regardless of deletion status, but I'm not able to find any good sources.

replies(1): >>45065666 #

10. simonw ◴[29 Aug 25 14:55 UTC] No.45064966[source]▶

>>45062904 (TP) #

The hardest problem in computer science in 2025 is convincing people that you aren't loading every piece of their personal data into a machine learning training run.

The cynic in me wonders if part of Anthropic's decision process here was that, since nobody believes you when you say you're not using their data for training, you may as well do it anyway!

Giving people an opt-out might even increase trust, since people can now at least see an option that they control.

replies(1): >>45065289 #

11. toyg ◴[29 Aug 25 15:19 UTC] No.45065287{3}[source]▶

>>45064261 #

TBH, I'd be surprised if they kept significant amounts around for longer, for the simple reason that it costs money. Yes, drives are cheap, but the electricity to keep them online for months and years is definitely not free, and physical space is not infinite. This is also why some of their services have pretty aggressive deletion policies (like recordings in MS Teams, etc).

12. behnamoh ◴[29 Aug 25 15:19 UTC] No.45065289[source]▶

>>45064966 #

> The cynic in me wonders if part of Anthropic's decision process here was that, since nobody believes you when you say you're not using their data for training, you may as well do it anyway!

This is why I love-hate Anthro, the same way I love-hate Apple. The reason is simple: Great product, shitty MBA-fueled managerial decisions.

13. layer8 ◴[29 Aug 25 15:22 UTC] No.45065323[source]▶

>>45062904 (TP) #

You wouldn’t have needed to assume if you had read their previous policy.

14. demarq ◴[29 Aug 25 15:31 UTC] No.45065428[source]▶

>>45062904 (TP) #

Same, I thought the free accounts were always trained on. Which in my opinion is reasonable since you could delete the data you didn’t want to keep on the service.

But including paid accounts and doing 5 year retention however is confounding.

15. diggan ◴[29 Aug 25 15:45 UTC] No.45065593{5}[source]▶

>>45064452 #

> now it's making me wonder if Discord is really archiving CSAM material for two years and not helping law enforcement unless a proper warrant is involved

Yes, of course, to both of those. Discord is a for-profit business with limited amount of humans who can focus on things, so the less they can focus on, the better (in the mind of the people running the business at least). So why do anything when you can do nothing, and everything stays the same? Of course when someone has an warrant, they really have to do something, but unless there is, there is no incentive for them to do anything about it.

16. diggan ◴[29 Aug 25 15:51 UTC] No.45065666{4}[source]▶

>>45064818 #

Even if Google are not storing it, we can sleep safely as NSA's PRISM V2 probably got an archive of it too :) Albeit hard to acquire a dump of those archives, for now at least...

17. Aurornis ◴[29 Aug 25 16:08 UTC] No.45065912[source]▶

>>45062904 (TP) #

I don't understand this mindset. Why would you assume anything? It took me a couple minutes at most to check when I first started using Claude.

I check when I start using any new service. The cynical assumption that everything's being shared leads to shrugging it off and making no attempt to look for settings.

It only takes a moment to go into settings -> privacy and look.

replies(7): >>45065932 #>>45065968 #>>45066053 #>>45066125 #>>45068206 #>>45068998 #>>45070223 #

18. hshdhdhj4444 ◴[29 Aug 25 16:10 UTC] No.45065932[source]▶

>>45065912 #

Huh, they’re not assuming anything is “being shared”.

They’re assuming that Anthropic that is already receiving and storing your data, is also training their models on that data.

How are you supposed to disprove that as a user?

Also, the whole point is that companies cannot be trusted to follow the settings.

replies(1): >>45067803 #

19. Capricorn2481 ◴[29 Aug 25 16:12 UTC] No.45065968[source]▶

>>45065912 #

> It only takes a moment to go into settings -> privacy and look.

Do you have any reason to think this does anything?

replies(1): >>45066045 #

20. serial_dev ◴[29 Aug 25 16:18 UTC] No.45066045{3}[source]▶

>>45065968 #

Jira ticket Nr 97437838. Training service ignores settings, trains on your data anyway. Priority: extremely low. Will probably do it in 2031 when the intern joins.

replies(1): >>45066557 #

21. efficax ◴[29 Aug 25 16:19 UTC] No.45066053[source]▶

>>45065912 #

A silicon valley startup would never say one thing and then do another!

22. lbrito ◴[29 Aug 25 16:25 UTC] No.45066125[source]▶

>>45065912 #

>Why would you assume anything?

Because they already used data without permission on a much larger scale, so it's a perfectly logical assumption that they would continue doing so with their users?

replies(1): >>45067797 #

23. nbulka ◴[29 Aug 25 16:57 UTC] No.45066557{4}[source]▶

>>45066045 #

!!!!!!!!!! this... all the times HIPAA and data privacy laws get ignored directly in Jira tickets too. SMH

24. racl101 ◴[29 Aug 25 17:26 UTC] No.45066950[source]▶

>>45062904 (TP) #

Nope. I always assumed they were and acted accordingly.

25. simonw ◴[29 Aug 25 18:36 UTC] No.45067797{3}[source]▶

>>45066125 #

I don't think that logically makes sense.

Training on everything you can publicly scrape from the internet is a very different thing from training on data that your users submit directly to your service.

replies(2): >>45069962 #>>45070009 #

26. simonw ◴[29 Aug 25 18:37 UTC] No.45067803{3}[source]▶

>>45065932 #

Why can't companies be trusted to follow the settings?

If they add those settings why would you expect they wouldn't respect them? Do you think they're purely cosmetic features that don't actually do anything?

replies(3): >>45070033 #>>45070257 #>>45080866 #

27. UltraSane ◴[29 Aug 25 19:15 UTC] No.45068206[source]▶

>>45065912 #

Because the demand for training data is insatiable and they already are using basically everything available and they need more human generated data and chats with their own LLM is a perfect source.

28. themafia ◴[29 Aug 25 20:26 UTC] No.45068998[source]▶

>>45065912 #

> I check when I start using any new service.

So your assumption is that the reported privacy policy of any company is completely accurate. There there is no means for the company to violate this policy and that once violated you will immediately be notified.

> It only takes a moment to go into settings -> privacy and look.

It only takes a moment to examine history and observe why this is wholly inadequate.

29. rpgbr ◴[29 Aug 25 22:07 UTC] No.45069962{4}[source]▶

>>45067797 #

>Training on everything you can publicly scrape from the internet is a very different thing from training on data that your users submit directly to your service.

Yes. It's way easier and cheaper when the data comes to you instead of having to scrape everything elsewhere.

30. fcarraldo ◴[29 Aug 25 22:11 UTC] No.45070009{4}[source]▶

>>45067797 #

OpenAI, Meta and X all train from user submitted data, in Meta and X’s case data that had been submitted long before the advent of LLMs.

It’s not a leap to assume Anthropic does the same.

replies(1): >>45072303 #

31. fcarraldo ◴[29 Aug 25 22:14 UTC] No.45070033{4}[source]▶

>>45067803 #

Because they can’t be?

https://www.reuters.com/sustainability/boards-policy-regulat...

https://www.bbc.com/news/articles/cx2jmledvr3o

replies(1): >>45071304 #

32. stevenhuang ◴[29 Aug 25 22:30 UTC] No.45070135[source]▶

>>45062904 (TP) #

... and your assumptions were wrong until now. Not sure that's much of a dunk as you think it seems

33. sjapkee ◴[29 Aug 25 22:41 UTC] No.45070223[source]▶

>>45065912 #

Bro really thinks privacy settings work

34. fcarraldo ◴[29 Aug 25 22:45 UTC] No.45070257{4}[source]▶

>>45067803 #

Also currently being discussed[0], on this very site, is both speculation that Meta is surreptitiously scanning your camera roll and a comment claiming that they worked on an earlier implementation to do just that.

It’s shocking to me that anyone who works in our industry would trust any company to do as they claim.

[0] https://news.ycombinator.com/item?id=45062910

35. simonw ◴[30 Aug 25 02:04 UTC] No.45071304{5}[source]▶

>>45070033 #

There is an enormous gap between the behavior covered in those two cases and training machine learning models on user data that a company has specifically said it will not use for training.

36. adastra22 ◴[30 Aug 25 06:12 UTC] No.45072303{5}[source]▶

>>45070009 #

By X do you mean tweets? Can you not see how different that is from training on your private conversations with an LLM?

What if you ask it for medical advice, or legal things? What if you turn on Gmail integration? Should I now be able to generate your conversations with the right prompt?

replies(1): >>45085938 #

37. AlecSchueler ◴[31 Aug 25 06:32 UTC] No.45080866{4}[source]▶

>>45067803 #

Have you really never heard of companies saying one thing while doing another?

replies(1): >>45081600 #

38. simonw ◴[31 Aug 25 08:53 UTC] No.45081600{5}[source]▶

>>45080866 #

Yes, normally when they lose a lawsuit over it.

39. fcarraldo ◴[31 Aug 25 18:56 UTC] No.45085938{6}[source]▶

>>45072303 #

I don't think AI companies should be doing this, but they are doing it. All are opt-out, not opt-in. Anthropic is just changing their policies to be the same as their competition.

xAI trains Grok on both public data (Tweets) and non-public data (Conversations with Grok) by default. [0]

> Grok.com Data Controls for Training Grok: For the Grok.com website, you can go to Settings, Data, and then “Improve the Model” to select whether your content is used for model training.

Meta trains its AI on things posted to Meta's products, which are not as "public" as Tweets on X, because users expect these to be shared only with their networks. They do not use DMs, but they do use posts to Instagram/Facebook/etc. [1]

> We use information that is publicly available online and licensed information. We also use information shared on Meta Products. This information could be things like posts or photos and their captions. We do not use the content of your private messages with friends and family to train our AIs unless you or someone in the chat chooses to share those messages with our AIs.

OpenAI uses conversations for training data by default [2]

> When you use our services for individuals such as ChatGPT, Codex, and Sora, we may use your content to train our models.

> You can opt out of training through our privacy portal by clicking on “do not train on my content.” To turn off training for your ChatGPT conversations and Codex tasks, follow the instructions in our Data Controls FAQ. Once you opt out, new conversations will not be used to train our models.

[0] https://x.ai/legal/faq

[1] https://www.facebook.com/privacy/genai/

[2] https://help.openai.com/en/articles/5722486-how-your-data-is...