I mean, I am sure there are individuals, who still believe in the basic value of the word within the framework of our civilization, but, having seen those words not just twisted beyond recognition to fit a specific idea, but simply ignored when they were no longer convenient, it would be a surprise now that a cynical stance is not more common.

The question is: how does that affect their choices. How much ends up being gated what previously would have ended up in the open?

Me: I am using a local variant ( and attempting to build something I think I can control better ).

9. SirFatty ◴[29 Aug 25 12:39 UTC] No.45063307[source]▶

>>45062738 (OP) #

"going forward" ;-)

10. homarp ◴[29 Aug 25 12:41 UTC] No.45063341[source]▶

>>45062738 (OP) #

see other discussion https://news.ycombinator.com/item?id=45062683

11. ratg13 ◴[29 Aug 25 12:44 UTC] No.45063367[source]▶

>>45062738 (OP) #

I can understand training AIs on books, and even internet forums, but I can't help but think that training an AI on lots of dumb questions with probably an excessive amount of grammar and spelling errors will somehow make it smarter.

replies(4): >>45063503 #>>45063715 #>>45064070 #>>45068370 #

12. javier_e06 ◴[29 Aug 25 12:45 UTC] No.45063384[source]▶

>>45062738 (OP) #

I use AI to solve problems, not to check the weather or deciding what to wear. As such it makes sense for AI to remember when it hits the nail on the head.

replies(2): >>45063443 #>>45064693 #

13. I_am_tiberius ◴[29 Aug 25 12:47 UTC] No.45063394[source]▶

>>45062738 (OP) #

Criminal, evil thieves.

14. leetbulb ◴[29 Aug 25 12:52 UTC] No.45063443[source]▶

>>45063384 #

Agreed. Typically I would be against something like this, but in this case, have it.

replies(1): >>45064055 #

15. dahsameer ◴[29 Aug 25 12:56 UTC] No.45063503[source]▶

>>45063367 #

> and even internet forums

i would consider internet forums also includes a lot of dumb questions

replies(2): >>45063666 #>>45066596 #

16. internet2000 ◴[29 Aug 25 12:59 UTC] No.45063535[source]▶

>>45062738 (OP) #

I’m fine with that.

replies(1): >>45064069 #

17. wat10000 ◴[29 Aug 25 13:03 UTC] No.45063574[source]▶

>>45062738 (OP) #

Rather misleading title. Missing the important “unless you ask them not to” part. Sounds like a bit of a dark pattern to push you into accepting it and that’s not cool, but you do get a choice.

replies(1): >>45070835 #

18. ratg13 ◴[29 Aug 25 13:12 UTC] No.45063666{3}[source]▶

>>45063503 #

Agree, but people generally take a small pause before saying stuff online.

In 'private', people are less ashamed of their ignorance, and also know they can say gibberish and the AI will figure it out.

19. nrclark ◴[29 Aug 25 13:15 UTC] No.45063715[source]▶

>>45063367 #

Depends on how you’re using the data. There’s a pretty strong correctness signal in the user behavior.

Did they rephrase the question? Probably the first answer was wrong. Did the session end? Good chance the answer was acceptable. Did they ask follow-ups? What kind? Etc.

replies(2): >>45064137 #>>45064409 #

20. flerchin ◴[29 Aug 25 13:28 UTC] No.45063853[source]▶

>>45062738 (OP) #

Maybe a value to users if done correctly. The way it is right now, you can't teach the model anything. When it gets something wrong, it will probably get the same thing wrong again in another chat.

replies(1): >>45063949 #

21. phallus ◴[29 Aug 25 13:36 UTC] No.45063949[source]▶

>>45063853 #

That's not how LLMs work.

22. hexage1814 ◴[29 Aug 25 13:36 UTC] No.45063951[source]▶

>>45062904 #

This. It's the same innocence of people who believe when you delete a document on Google/META/Apple/Microsoft servers, it "really" gets deleted. Google most likely has a backup of every piece of information indexed by them in the last 20 years or so. It would cause envy to the Internet Archive.

replies(2): >>45064152 #>>45064261 #

23. AlexandrB ◴[29 Aug 25 13:44 UTC] No.45064055{3}[source]▶

>>45063443 #

How do you feel about this data being used to target advertising at you in the inevitable rush to monetize these AI products?

replies(1): >>45064674 #

24. mrweasel ◴[29 Aug 25 13:46 UTC] No.45064070[source]▶

>>45063367 #

They train AI on Reddit and Stack Overflow questions, I can't see it getting any worse.

25. dudefeliciano ◴[29 Aug 25 13:46 UTC] No.45064069[source]▶

>>45063535 #

you are fine with paying, 20, 90 or 200 euros a month AND having your data mined? i must be getting old...

replies(3): >>45064893 #>>45064931 #>>45069198 #

26. dudefeliciano ◴[29 Aug 25 13:51 UTC] No.45064137{3}[source]▶

>>45063715 #

> Did the session end? Good chance the answer was acceptable.

Or that the user just ragequit

27. giancarlostoro ◴[29 Aug 25 13:51 UTC] No.45064152{3}[source]▶

>>45063951 #

With the privacy laws out there, I do genuinely think they eventually get purged even from backups. I remember there being a really cool YouTube video shared here on HN that google no longer has publicly, it was about the process of an email and all the behind the scenes things, like physical security into a data center, to their patented hard drive shredders they use once the hard drives are to be tossed. I wish Google had kept that video public and online, it was a great watch.

I know once you delete something on Discord its poof, and that's the end of that. I've reported things that if anyone at Discord could access a copy of they would have called police. There's a lot of awful trolls on chat platforms that post awful things.

replies(2): >>45064268 #>>45064818 #

28. esafak ◴[29 Aug 25 13:52 UTC] No.45064156[source]▶

>>45062738 (OP) #

"If you use Claude for Work, via the API, or other services under our Commercial Terms or other Agreements, then these changes don't apply to you."

replies(1): >>45064998 #

29. 0xbadc0de5 ◴[29 Aug 25 13:56 UTC] No.45064209[source]▶

>>45062738 (OP) #

I kind of already assumed they were. I've got some pretty niche use-cases that I'd like to see the models get better at thinking their way through. I benefit from their training on my interactions. So I'll opt in. But I'll also recognize that others might not feel that way, so the services should provide a way for users to opt out.

replies(1): >>45070795 #

30. bwillard ◴[29 Aug 25 14:01 UTC] No.45064261{3}[source]▶

>>45063951 #

Officially, up to you if you believe they are following their policies, all of the companies have published statements on how long they keep their data after deletion (which customers broadly want to support recovery if something goes wrong).

- Google: active storage for "around 2 months from the time of deletion" and in backups "for up to 6 months": https://policies.google.com/technologies/retention?hl=en-US

- Meta: 90 days: https://www.meta.com/help/quest/609965707113909/

- Apple/iCloud: 30 days: https://support.apple.com/guide/icloud/delete-files-mm3b7fcd...

- Microsoft: 30-180 days: https://learn.microsoft.com/en-us/compliance/assurance/assur...

So if it ends up that they are storing data longer there can be consequences (GDPR, CCPA, FTC).

replies(1): >>45065287 #

31. diggan ◴[29 Aug 25 14:01 UTC] No.45064268{4}[source]▶

>>45064152 #

> I know once you delete something on Discord its poof, and that's the end of that. I've reported things that if anyone at Discord could access a copy of they would have called police. There's a lot of awful trolls on chat platforms that post awful things.

That's not what Discord themselves say, is that coming from Discord, the police or someone else?

> Once you delete content, it will no longer be available to other users (though it may take some time to clear cached uploads). Deleted content will also be deleted from Discord’s systems, but we may retain content longer if we have a legal obligation to preserve it as described below. Public posts may also be retained for 180 days to two years for use by Discord as described in our Privacy Policy (for example, to help us train models that proactively detect content that violates our policies). - https://support.discord.com/hc/en-us/articles/5431812448791-...

Seems to be something that decides if the content should be deleted faster, or kept for between 180 days - 2 years. So even for Discord, "once you delete something on Discord its poof" isn't 100% accurate.

replies(1): >>45064452 #

32. vb-8448 ◴[29 Aug 25 14:07 UTC] No.45064336[source]▶

>>45062738 (OP) #

So, I guess they run out of data to train on ...

I wonder on how much they can rely on the data and what kind of "knowledge" they can extract. I never give feedback and most time (let's say 5 out of 6) the result cc produce it simply wrong. How can they know the result is valuable or not?

replies(2): >>45064810 #>>45066420 #

33. vb-8448 ◴[29 Aug 25 14:13 UTC] No.45064409{3}[source]▶

>>45063715 #

I'm used to doing the same task 4 or 5 times (different sessions, similar prompts), and most of the time the result is useless or completely wrong. Sometimes I go back and pick the first result, other time none of them, other time a mix of them. I'm wondering how can they extract value from this.

34. giancarlostoro ◴[29 Aug 25 14:16 UTC] No.45064452{5}[source]▶

>>45064268 #

At least in terms of reporting content to "Trust and Safety" they certainly behave like its gone forever. I have had friends report illegal content, to both Discord and law enforcement, the take away seemed like it was gone, now it's making me wonder if Discord is really archiving CSAM material for two years and not helping law enforcement unless a proper warrant is involved, yikes.

replies(1): >>45065593 #

35. binary132 ◴[29 Aug 25 14:21 UTC] No.45064509[source]▶

>>45062738 (OP) #

$COMPANY reneged on their solemn pinky promise to not do the bad thing this time? Quelle surprise!

replies(1): >>45070773 #

36. ChrisArchitect ◴[29 Aug 25 14:34 UTC] No.45064671[source]▶

>>45062738 (OP) #

[dupe] https://news.ycombinator.com/item?id=45053806

37. christophilus ◴[29 Aug 25 14:34 UTC] No.45064674{4}[source]▶

>>45064055 #

I feel like that’s annoying, but it’s a drop in the bucket vs the current firehose of ads, and there’s a slim shot these ads might actually be interesting or relevant to me.

Anyway, I’ll block them like I do everything.

replies(1): >>45064715 #

38. ath3nd ◴[29 Aug 25 14:36 UTC] No.45064693[source]▶

>>45063384 #

And if you solve a novel problem, Claude will happily take your reasoning and give it to the next user trying to solve the same novel problem. Imagine if that was a guy working for the competition :)

39. ath3nd ◴[29 Aug 25 14:37 UTC] No.45064715{5}[source]▶

>>45064674 #

Oh, sweet summer child, your SOLUTIONS will be trained on and will be given to others without your permission and knowledge.

But now that you bring up ads, I guarantee you that those will somehow be incorporated in Claude soon.

40. ljosifov ◴[29 Aug 25 14:38 UTC] No.45064731[source]▶

>>45062738 (OP) #

Excellent. What were they waiting for up to now?? I thought they already trained on my data. I assume they train, even hope that they train, even when they say they don't. People that want to be data privacy maximalists - fine, don't use their data. But there are people out there (myself) that are on the opposite end of the spectrum, and we are mostly ignored by the companies. Companies just assume people only ever want to deny them their data.

It annoys me greatly, that I have no tick box on Google to tell them "go and adapt models I use on my Gmail, Photos, Maps etc." I don't want Google to ever be mistaken where I live - I have told them 100 times already.

This idea that "no one wants to share their data" is just assumed, and permeates everything. Like soft-ball interviews that a popular science communicator did with DeepMind folks working in medicine: every question was prefixed by litany of caveats that were all about 1) assumed aversion of people to sharing their data 2) horrors and disasters that are to befall us should we share the data. I have not suffered any horrors. I'm not aware of any major disasters. I'm aware of major advances in medicine in my lifetime. Ultimately the process does involve controlled data collection and experimentation. Looks a good deal to me tbh. I go out of my way to tick all the NHS boxes too, to "use my data as you see fit". It's an uphill struggle. The defaults are always "deny everything". Tick boxes never go away, there is no master checkbox "use any and all of my data and never ask me again" to tick.

replies(7): >>45064789 #>>45064878 #>>45064882 #>>45064906 #>>45065043 #>>45065125 #>>45066501 #

41. ardit33 ◴[29 Aug 25 14:43 UTC] No.45064789[source]▶

>>45064731 #

This is a problem for folks with sensitive data, and also for coorporate users who don't want their data being used for it due to all kinds of liability issues.

I am sure they will have a coorporate carve out, otherwise it makes them unusuable for some large corps.

42. jlarocco ◴[29 Aug 25 14:44 UTC] No.45064810[source]▶

>>45064336 #

How can they know anything they train on is valuable?

At the end of the day it doesn't matter. You got the wrong answer and didn't complain, so why would they care?

replies(1): >>45069944 #

43. conradev ◴[29 Aug 25 14:44 UTC] No.45064818{4}[source]▶

>>45064152 #

My understanding is that for Gmail specifically, they keep a record of every email ever received regardless of deletion status, but I'm not able to find any good sources.

replies(1): >>45065666 #

44. SantalBlush ◴[29 Aug 25 14:49 UTC] No.45064878[source]▶

>>45064731 #

If only you were just giving them your own data. In reality, you're giving them data about your friends, relatives, and coworkers without their consent. Let's stop pretending there is any way to opt out by simply not using these companies' services; it isn't true.

45. simonw ◴[29 Aug 25 14:49 UTC] No.45064882[source]▶

>>45064731 #

If you have an API key for a paid service, would you be OK with someone asking ChatGPT or VS Code Copilot for an API key for that service and getting yours, which they then use to rack up bills that you have to pay?

46. jlarocco ◴[29 Aug 25 14:50 UTC] No.45064893{3}[source]▶

>>45064069 #

It's a tool that dependson data mining everything. The only surprise is that they weren't already data mining what people feed into it.

47. otikik ◴[29 Aug 25 14:50 UTC] No.45064906[source]▶

>>45064731 #

“Claude, please write and commit this as if you were ljosifov. Yes, please use his GitHub token, thank you”

48. bethekidyouwant ◴[29 Aug 25 14:52 UTC] No.45064931{3}[source]▶

>>45064069 #

what are you doing with the data? What is your legacy going to be other than the data that you leave to be mined? Do you just not want something else to benefit from something that has no benefit to you? If so, why?

replies(2): >>45065451 #>>45065468 #

49. simonw ◴[29 Aug 25 14:55 UTC] No.45064966[source]▶

>>45062904 #

The hardest problem in computer science in 2025 is convincing people that you aren't loading every piece of their personal data into a machine learning training run.

The cynic in me wonders if part of Anthropic's decision process here was that, since nobody believes you when you say you're not using their data for training, you may as well do it anyway!

Giving people an opt-out might even increase trust, since people can now at least see an option that they control.

replies(1): >>45065289 #

50. troyvit ◴[29 Aug 25 14:57 UTC] No.45064998[source]▶

>>45064156 #

This is what I don't get. It's so much simpler to use the LLMs with other tools (aider for me) that I don't understand why people are avoiding the API creating monthly accounts to begin with. Is it cheaper? Is Claude Code really that awesome or something? By not even looking at it I guess I never have to know, but from where I sit it seems like people are just putting themselves through a lot of b.s. in order to marry a start-up.

replies(2): >>45065108 #>>45065370 #

51. mushufasa ◴[29 Aug 25 15:00 UTC] No.45065040[source]▶

>>45062738 (OP) #

isn't this just a change from opt-in to opt-out? will make a big difference but still gives individuals control of their data governance

52. koolba ◴[29 Aug 25 15:01 UTC] No.45065043[source]▶

>>45064731 #

> It annoys me greatly, that I have no tick box on Google to tell them "go and adapt models I use on my Gmail, Photos, Maps etc." I don't want Google to ever be mistaken where I live - I have told them 100 times already.

As we’ve seen LLMs be able to fully regenerate text from their sources (or at least close enough), aren’t you the least bit worried about your personal correspondence magically appearing in the wild?

replies(1): >>45073184 #

53. data-ottawa ◴[29 Aug 25 15:05 UTC] No.45065108{3}[source]▶

>>45064998 #

According to Claude cost I get about 5x value in token cost by having a max subscription.

I upgraded after I hit the equivalent spend in API fees in a month.

54. p3rls ◴[29 Aug 25 15:07 UTC] No.45065125[source]▶

>>45064731 #

I think the real frustrating part is that they're using your data, scanning every driver's license etc that comes onto the google play store-- and there's still scammers etc using official google products that people catch everyday on twitter now that scambaiting is becoming a popular pastime.

55. joewhale ◴[29 Aug 25 15:13 UTC] No.45065209[source]▶

>>45062738 (OP) #

i logged on and they did ask right away in a popup window.

56. toyg ◴[29 Aug 25 15:19 UTC] No.45065287{4}[source]▶

>>45064261 #

TBH, I'd be surprised if they kept significant amounts around for longer, for the simple reason that it costs money. Yes, drives are cheap, but the electricity to keep them online for months and years is definitely not free, and physical space is not infinite. This is also why some of their services have pretty aggressive deletion policies (like recordings in MS Teams, etc).

57. behnamoh ◴[29 Aug 25 15:19 UTC] No.45065289{3}[source]▶

>>45064966 #

> The cynic in me wonders if part of Anthropic's decision process here was that, since nobody believes you when you say you're not using their data for training, you may as well do it anyway!

This is why I love-hate Anthro, the same way I love-hate Apple. The reason is simple: Great product, shitty MBA-fueled managerial decisions.

58. layer8 ◴[29 Aug 25 15:22 UTC] No.45065323[source]▶

>>45062904 #

You wouldn’t have needed to assume if you had read their previous policy.

59. diamond559 ◴[29 Aug 25 15:23 UTC] No.45065324[source]▶

>>45062738 (OP) #

"Training" is now a euphemism for stealing. Guess I can't write any production level code w/ this...

replies(1): >>45065825 #

60. some_random ◴[29 Aug 25 15:24 UTC] No.45065345[source]▶

>>45062738 (OP) #

Title is misleading, they're now opt-out rather than opt-in to your data being used for training. All you have to do is flip a single switch in the options to turn it off, I don't understand why everyone is treating this as being such a big deal.

Edit: I just logged in to opt out, they presented me with the switch directly. It was two clicks.

replies(4): >>45065365 #>>45066733 #>>45066879 #>>45070927 #

61. rkomorn ◴[29 Aug 25 15:26 UTC] No.45065365[source]▶

>>45065345 #

I think any switch from opt-out-by-default to opt-in-by-default sucks, especially when it has no clear immediate benefit to the person being opted in.

Disclaimer: not a Claude user (not even a prospective one)

replies(3): >>45065405 #>>45065429 #>>45066041 #

62. furyofantares ◴[29 Aug 25 15:27 UTC] No.45065370{3}[source]▶

>>45064998 #

> Is it cheaper?

"ccusage" is telling me I would have spent $2010.90 in the last month if I was paying via the API, rather than $200.

But also I do feel Claude Code is quite a bit better than other things I've used, when using the same model. I'm not sure why though, it's a fairly simple program with only a few prompts and only a few tools, it seems like others could catch up immediately by learning some lessons from it.

replies(1): >>45066160 #

63. some_random ◴[29 Aug 25 15:29 UTC] No.45065405{3}[source]▶

>>45065365 #

I don't think it's good, but people both here and on reddit are acting like this is some Great Betrayal when it's just a single switch that they prominently present to you. If they're going to make this change, this is exactly how I'd want them to do it.

replies(1): >>45065444 #

64. demarq ◴[29 Aug 25 15:31 UTC] No.45065428[source]▶

>>45062904 #

Same, I thought the free accounts were always trained on. Which in my opinion is reasonable since you could delete the data you didn’t want to keep on the service.

But including paid accounts and doing 5 year retention however is confounding.

65. latexr ◴[29 Aug 25 15:31 UTC] No.45065429{3}[source]▶

>>45065365 #

> any switch from opt-out-by-default to opt-in-by-default sucks

It’s the reverse. This was opt-in and is now opt-out. Opt means choose so when “the default is opt-in” it means the option is “no” by default and you have the option to make it “yes”.

replies(1): >>45065671 #

66. latexr ◴[29 Aug 25 15:32 UTC] No.45065444{4}[source]▶

>>45065405 #

> If they're going to make this change

Feels like the complaint is precisely that people don’t want them to make this change.

> this is exactly how I'd want them to do it.

Sees naive to believe it will always be done like this, especially for new users.

replies(1): >>45065557 #

67. TheRoque ◴[29 Aug 25 15:33 UTC] No.45065446[source]▶

>>45062738 (OP) #

To be honest, these companies already stole terabytes of data and don't even disclose their dataset, so you have to assume they'll steal and train at anything you throw at them

replies(4): >>45066376 #>>45066970 #>>45068970 #>>45077378 #

68. dudefeliciano ◴[29 Aug 25 15:33 UTC] No.45065451{4}[source]▶

>>45064931 #

There is a myriad of reasons i may not want my data to be used. Maybe I am working on proprietary systems, maybe I am using Claude as a psychotherapist, maybe I use it as a tax advisor, the list goes on. Is it unrealistic to think that data may be extrapolated and connected to me in the future?

replies(3): >>45065810 #>>45068090 #>>45070900 #

69. dudefeliciano ◴[29 Aug 25 15:35 UTC] No.45065468{4}[source]▶

>>45064931 #

Oh and i forgot the most important thing, I am paying good money for this service, now they also mine my data? I grew up in a time where "if it's free, you're the product". I guess that's not even the case anymore, if you pay, you're still the product...

replies(1): >>45068416 #

70. darrmit ◴[29 Aug 25 15:36 UTC] No.45065484[source]▶

>>45062738 (OP) #

I have never input anything into one of these tools that I wasn’t entirely comfortable with them using for training or any other reason. I just assumed it was happening.

71. some_random ◴[29 Aug 25 15:43 UTC] No.45065557{5}[source]▶

>>45065444 #

First off, I don't think going into the settings and flipping a toggle switch once is a huge burden on those who want to use a service privately. But more importantly, some of the comments here are so hysterical I have to assume that they read the title and jumped to the conclusion that you cannot opt out anymore without a business account.

72. diggan ◴[29 Aug 25 15:45 UTC] No.45065593{6}[source]▶

>>45064452 #

> now it's making me wonder if Discord is really archiving CSAM material for two years and not helping law enforcement unless a proper warrant is involved

Yes, of course, to both of those. Discord is a for-profit business with limited amount of humans who can focus on things, so the less they can focus on, the better (in the mind of the people running the business at least). So why do anything when you can do nothing, and everything stays the same? Of course when someone has an warrant, they really have to do something, but unless there is, there is no incentive for them to do anything about it.

73. diggan ◴[29 Aug 25 15:51 UTC] No.45065666{5}[source]▶

>>45064818 #

Even if Google are not storing it, we can sleep safely as NSA's PRISM V2 probably got an archive of it too :) Albeit hard to acquire a dump of those archives, for now at least...

74. rkomorn ◴[29 Aug 25 15:51 UTC] No.45065671{4}[source]▶

>>45065429 #

> they're now opt-out rather than opt-in to your data being used for training

This is what the comment I was replying to said. I took that to mean "you have to opt out (ie you're opted in by default)".

replies(1): >>45067732 #

75. tiahura ◴[29 Aug 25 16:01 UTC] No.45065810{5}[source]▶

>>45065451 #

Maybe I am working on proprietary systems, maybe I am using Claude as a psychotherapist, maybe I use it as a tax advisor, the list goes on.

Then use the business version.

replies(1): >>45066038 #

76. tiahura ◴[29 Aug 25 16:02 UTC] No.45065825[source]▶

>>45065324 #

Some people would say that since the owner isn't being deprived of anything, it's not stealing.

replies(1): >>45068341 #

77. Aurornis ◴[29 Aug 25 16:08 UTC] No.45065912[source]▶

>>45062904 #

I don't understand this mindset. Why would you assume anything? It took me a couple minutes at most to check when I first started using Claude.

I check when I start using any new service. The cynical assumption that everything's being shared leads to shrugging it off and making no attempt to look for settings.

It only takes a moment to go into settings -> privacy and look.

replies(7): >>45065932 #>>45065968 #>>45066053 #>>45066125 #>>45068206 #>>45068998 #>>45070223 #

78. hshdhdhj4444 ◴[29 Aug 25 16:10 UTC] No.45065932{3}[source]▶

>>45065912 #

Huh, they’re not assuming anything is “being shared”.

They’re assuming that Anthropic that is already receiving and storing your data, is also training their models on that data.

How are you supposed to disprove that as a user?

Also, the whole point is that companies cannot be trusted to follow the settings.

replies(1): >>45067803 #

79. Capricorn2481 ◴[29 Aug 25 16:12 UTC] No.45065968{3}[source]▶

>>45065912 #

> It only takes a moment to go into settings -> privacy and look.

Do you have any reason to think this does anything?

replies(1): >>45066045 #

80. dudefeliciano ◴[29 Aug 25 16:18 UTC] No.45066038{6}[source]▶

>>45065810 #

lol wtf kind of reply is that? Maybe relevant for the proprietary system part. I guess you would be fine with a tax advisor or therapist charging you more for a "gossip free service" where they will not disclose your personal information if you just pay more.

To be clear, i don't use claude for any of those purposes, it's the principle i am talking about.

81. currymj ◴[29 Aug 25 16:18 UTC] No.45066041{3}[source]▶

>>45065365 #

i think skepticism is healthy, but they've handled this in a fairer way than any other online product i've used before.

they gave me a popup to agree to the ToS change, but I can ignore it for a month and still use the product. In the popup, they clearly explained the opt-out switch, which is available in the popup itself as well as in the settings.

82. serial_dev ◴[29 Aug 25 16:18 UTC] No.45066045{4}[source]▶

>>45065968 #

Jira ticket Nr 97437838. Training service ignores settings, trains on your data anyway. Priority: extremely low. Will probably do it in 2031 when the intern joins.

replies(1): >>45066557 #

83. efficax ◴[29 Aug 25 16:19 UTC] No.45066053{3}[source]▶

>>45065912 #

A silicon valley startup would never say one thing and then do another!

84. lbrito ◴[29 Aug 25 16:25 UTC] No.45066125{3}[source]▶

>>45065912 #

>Why would you assume anything?

Because they already used data without permission on a much larger scale, so it's a perfectly logical assumption that they would continue doing so with their users?

replies(1): >>45067797 #

85. troyvit ◴[29 Aug 25 16:28 UTC] No.45066160{4}[source]▶

>>45065370 #

Daaaaamn that makes a lot of sense then. For my limited use it doesn't add up but clearly the more deeply embedded the tool is the more it makes sense.

86. donohoe ◴[29 Aug 25 16:39 UTC] No.45066305[source]▶

>>45062738 (OP) #

I just logged in on the iOS app and it immediately had a popup giving me the option to opt-out.

So yeah, annoying, but they handled it well.

replies(1): >>45070763 #

87. marssaxman ◴[29 Aug 25 16:44 UTC] No.45066376[source]▶

>>45065446 #

"Reading stuff freely posted on the internet" constitutes stealing now?

Seems like an excessively draconian interpretation of property rights.

replies(10): >>45066424 #>>45066467 #>>45066537 #>>45068095 #>>45068974 #>>45069163 #>>45069363 #>>45069550 #>>45074841 #>>45076689 #

88. debesyla ◴[29 Aug 25 16:47 UTC] No.45066420[source]▶

>>45064336 #

Maybe they can use same method as Google does - if user clicked a link and didn't try to search again, it can assume the link had intended result.

So your silence can be used as a warmish signal that you were satisfied. (...or not. Depends on your usage fingerprint.)

replies(1): >>45066551 #

89. michaelmior ◴[29 Aug 25 16:47 UTC] No.45066424{3}[source]▶

>>45066376 #

"Reading stuff freely posted on the internet" is also very different from a business having machines consume large volumes of data posted on the Internet for the purpose of generating value for them without compensating the creators. I'm not making a value judgement one way or the other, but "reading stuff freely posted on the Internet" is an oversimplification.

replies(5): >>45066511 #>>45066562 #>>45068503 #>>45070930 #>>45071058 #

90. nbulka ◴[29 Aug 25 16:50 UTC] No.45066464[source]▶

>>45062738 (OP) #

For those who do not, or cannot, read this announcement prior to September 28th (think people in the hospital, traveling, missed an email ..) is this not a total breach of contract?

Legally, I don't understand how Anthropic's lawyers would have allowed this. Maybe I am just naively optimistic about these matters? I am a Max customer and I might leave! Talk about a "rug pull" ... and I considering moving to an inferior provider! Privacy is a fundamental human right. Please do better, we have not learned our lesson in tech or society because no one is facing any consequences.

replies(1): >>45071997 #

91. timeon ◴[29 Aug 25 16:50 UTC] No.45066467{3}[source]▶

>>45066376 #

What "reading"?

replies(2): >>45066495 #>>45066513 #

92. aatd86 ◴[29 Aug 25 16:51 UTC] No.45066475[source]▶

>>45062738 (OP) #

I don't know what they've been training on but I just canceled claude for the second time. Besides the numerous UI bugs of the web interface, incessant flickerings, it has gotten weirdly super condescending and negative in a way I hadn't observed neither in the past nor with other llms.

Probably that people accused it of being sycophantic and they have tried to adjust it but they didn't do it well. It'd rather criticize and make assumptions about my behavior rather than keeping it technical. Ha!

I prefer gemini. Seems a bit stressed always assuming that I might be frustrated by its answers which is also weird to assume but it is not straight disrespectful at least.

So I am back to testing chatgpt. I keep changing.

replies(3): >>45066878 #>>45071856 #>>45073749 #

93. marssaxman ◴[29 Aug 25 16:52 UTC] No.45066495{4}[source]▶

>>45066467 #

The same reading search engine crawlers have been doing since time immemorial.

replies(2): >>45066556 #>>45066843 #

94. JohnMakin ◴[29 Aug 25 16:53 UTC] No.45066501[source]▶

>>45064731 #

The fact you are not aware of abuse, or abuse has not yet happened to you, does not mean it isn't a problem for you.

> The defaults are always "deny everything".

This is definitely not true for a massive amount of things, I'm unsure how you're even arriving at this conclusion.

replies(1): >>45067219 #

95. mrdependable ◴[29 Aug 25 16:53 UTC] No.45066505[source]▶

>>45062738 (OP) #

This going to turn into one of those situations where we find out they trained on everyone whether they opted-out or not down the line. I want to keep using Claude, but I also don't want all the solutions I come up with to become common knowledge.

replies(3): >>45066616 #>>45066931 #>>45066953 #

96. marssaxman ◴[29 Aug 25 16:53 UTC] No.45066511{4}[source]▶

>>45066424 #

Okay, but "stealing" is also an oversimplification, to the point of absurdity.

It makes no sense to put stuff up on the internet where it can freely be downloaded by anyone at any time, by people who are then free to do whatever they like with it on their own hardware, then complain that people have downloaded that stuff and done what they liked with it on their own hardware.

"Having machines consume large volumes of data posted on the Internet for the purpose of generating value for them without compensating the creators" is equally a description of Google.

replies(9): >>45066575 #>>45067827 #>>45068034 #>>45068085 #>>45068365 #>>45069767 #>>45070721 #>>45072004 #>>45073608 #

97. kridsdale1 ◴[29 Aug 25 16:54 UTC] No.45066513{4}[source]▶

>>45066467 #

Looking at and gaining knowledge.

98. TheRoque ◴[29 Aug 25 16:56 UTC] No.45066537{3}[source]▶

>>45066376 #

I'm not talking about that, I'm taking about downloading gigabytes of books, and movies and who knows what data (since it's not disclosed) without paying. Those are not freely posted on the internet. Well, not legally anyways.

99. rurp ◴[29 Aug 25 16:57 UTC] No.45066551{3}[source]▶

>>45066420 #

I expect that's a very weak signal. When I ask a question and get a completely wrong answer from Claude I usually drop the chat and look elsewhere.

100. TheRoque ◴[29 Aug 25 16:57 UTC] No.45066556{5}[source]▶

>>45066495 #

Search engines never claimed that their content was orignal, and redirect to the original author (which gets proper retribution)

101. nbulka ◴[29 Aug 25 16:57 UTC] No.45066557{5}[source]▶

>>45066045 #

!!!!!!!!!! this... all the times HIPAA and data privacy laws get ignored directly in Jira tickets too. SMH

102. bdamm ◴[29 Aug 25 16:57 UTC] No.45066562{4}[source]▶

>>45066424 #

We didn't seem to mind when Google was doing it back in 1999, or Lycos, Altavista, etc before them... why do we care about the LLM companies doing it now?

replies(2): >>45066668 #>>45066980 #

103. ehnto ◴[29 Aug 25 16:58 UTC] No.45066575{5}[source]▶

>>45066511 #

They are not free to do whatever they like, there are tomes of laws across all countries governing what someone can and cannot do with your intellectual property. Just because we didn't have the foresight to add in a "if by chance in the future someone invents artificial intelligence, that's not fair use" is a shame, but doesn't make what these companies are doing ethical or morale.

I don't disagree regarding Google, I also think they exploited others IP for their own gain. It was once symbiotic with webmasters, but when that stopped they broke that implied good faith contract. In a sense, their snippets and widgets using others IP and no longer providing traffic to the site was the warning shot for where we are now. We should have been modernising IP laws back then.

replies(1): >>45067125 #

104. timeon ◴[29 Aug 25 17:00 UTC] No.45066596{3}[source]▶

>>45063503 #

Like what?

105. treyd ◴[29 Aug 25 17:01 UTC] No.45066616[source]▶

>>45066505 #

Why don't you want to share your insights? I agree doing it in a more direct way would be better than it leaking through AI training you don't control. But your phrasing seems stronger than that.

106. codazoda ◴[29 Aug 25 17:05 UTC] No.45066668{5}[source]▶

>>45066562 #

I find LLMs extremely useful but I think the difference is that they regurgitate the content (not verbatim) instead of a link to it. This is not unlike how a human might tell their friend about it.

replies(2): >>45067050 #>>45068274 #

107. drdaeman ◴[29 Aug 25 17:10 UTC] No.45066733[source]▶

>>45065345 #

[Edit: turns out I've got it wrong, and 5-year retention only said to apply to the data they're allowed to train on. This changes things for me.]

Personally, I don't mind training, as long as I have a say on the matter - and they have a switch for this. Opt-out is not exactly cool, but I've got the popup in my face, almost a month before the changes, and that's respectful enough for me.

This said, I've just canceled my subscription because this new 5-year mandatory data retention is a deal breaker for me. I don't mind 30 or 60 days or even 90 days - I can understand the need to briefly persist the data. But for anything long-term (and 5 years is effectively permanent) I want to be respected with having a choice, and I'm provided none except for "don't use".

A shame, but fortunately they're not a monopoly.

replies(1): >>45067040 #

108. boredatoms ◴[29 Aug 25 17:18 UTC] No.45066832[source]▶

>>45062738 (OP) #

..and there goes our authorization to use it at work

109. ehnto ◴[29 Aug 25 17:19 UTC] No.45066843{5}[source]▶

>>45066495 #

No one gave them permission to access their webservers back then either. Before it's cited that there is precedent in law, that is in the US. No such precedent exists in my country, and our laws suggest that unauthorized access regardless of "gates up or down" would constitute trespassing. There are also no protections for scrapers coming out of prior lawsuits, and copying copyrighted material is of course illegal.

Which is just to point out that the world wide web is not its own jurisdiction, and I believe AI companies are going to be finding that an ongoing problem. Unlike search, there is no symbiosis here, so there is an incentive to sue. The original IP holders do not benefit in any way. Search was different in that way.

110. MaJiX19 ◴[29 Aug 25 17:21 UTC] No.45066872[source]▶

>>45062738 (OP) #

Unbelievable. This is on par ethically with bad Meta decisions as far as privacy is concerned. What a dark pattern! What a mess! Look at this botched rollout... It's not ok to do this folks. This is literally what the modal looked like for me on chats that were already in progress. And NO I did not want them to train on this or ANY OF MY DATA UNDER A DIFFERENT CONTRACT.

Anthropic PR: "Ma'am, you opted IN to training on your therapy sessions and intellectual property and algorithms and salary and family history!" Don't you remember the modal???

The Modal: https://imgur.com/afqMi0Z

111. novok ◴[29 Aug 25 17:21 UTC] No.45066878[source]▶

>>45066475 #

They need adjustable dials like they do in some API interfaces for power users. Like a sycophancy dial, a safety dial or "turn off the child lock permanently" like they have for microwaves.

112. boredatoms ◴[29 Aug 25 17:21 UTC] No.45066879[source]▶

>>45065345 #

No thats BS, lots of people wont know the default got flipped

replies(1): >>45067979 #

113. indigoabstract ◴[29 Aug 25 17:24 UTC] No.45066910[source]▶

>>45062738 (OP) #

Modern AI is built on data. Trusting that our conversations will not be used for training the model is a bit like giving a glutton their favourite food but making them promise not to eat it. Sure, why not.

I still expect that our conversations will not leave the premises (ie end up on the internet), because that would be something else, but other than that, I knew what I signed up for.

replies(1): >>45066966 #

114. racl101 ◴[29 Aug 25 17:25 UTC] No.45066930[source]▶

>>45062738 (OP) #

I always assumed they were. I've been masking / scrubbing my test data this whole time.

115. ukd1 ◴[29 Aug 25 17:25 UTC] No.45066931[source]▶

>>45066505 #

I think I'm fine with the whole getting better due to something helped it / co find with it. I'm not happy if it's directly 1:1 or attributed to me - chatham house rule for this would be great.

replies(1): >>45072310 #

116. racl101 ◴[29 Aug 25 17:26 UTC] No.45066950[source]▶

>>45062904 #

Nope. I always assumed they were and acted accordingly.

117. skybrian ◴[29 Aug 25 17:27 UTC] No.45066953[source]▶

>>45066505 #

When has a company ignored an opt-out preference before? It sounds like you have something in mind.

replies(1): >>45069010 #

118. nbulka ◴[29 Aug 25 17:28 UTC] No.45066966[source]▶

>>45066910 #

Don't normalize this. There are contractual obligations that we have to enforce in order to keep our privacy and humanity.

119. nbulka ◴[29 Aug 25 17:28 UTC] No.45066970[source]▶

>>45065446 #

No you don't. You don't have to assume people are going to be bad! We should not normalize it either.

replies(2): >>45067043 #>>45067604 #

120. nbulka ◴[29 Aug 25 17:29 UTC] No.45066980{5}[source]▶

>>45066562 #

Because they have terms of service they have to adhere to. We need laws to be lawful.

121. some_random ◴[29 Aug 25 17:33 UTC] No.45067040{3}[source]▶

>>45066733 #

Data retention appears to be predicated on opting in to allowing training. If you don't opt in, they retain it for the same 30 days they were already retaining it for. https://www.anthropic.com/news/updates-to-our-consumer-terms

replies(1): >>45067217 #

122. kolektiv ◴[29 Aug 25 17:33 UTC] No.45067043{3}[source]▶

>>45066970 #

You don't have to assume people are going to be bad, but it's reasonable and prudent to expect it from people who have already shown themselves to be so (in this context).

I trust people until they give me cause to do otherwise.

replies(2): >>45067148 #>>45067360 #

123. Nevermark ◴[29 Aug 25 17:34 UTC] No.45067050{6}[source]▶

>>45066668 #

> This is not unlike how a human might tell their friend about it.

Is there someone who has read the whole internet? Can we all be there friend?

The entire basis of fair use is scale matters.

124. marssaxman ◴[29 Aug 25 17:39 UTC] No.45067125{6}[source]▶

>>45066575 #

I did say "free to do whatever they like on their own hardware", because intellectual property laws generally govern the transfer of such property rather than the use.

After seeing the harm done by the expansion of patent law to cover software algorithms, and the relentless abuse done under the DMCA, I am reflexively skeptical of any effort to expand intellectual property concepts.

replies(1): >>45068489 #

125. nbulka ◴[29 Aug 25 17:41 UTC] No.45067148{4}[source]▶

>>45067043 #

Training on personal data people thought was going to remain private vs. stuff out in public view (copyright or not), are two different magnitudes of ethics breaches. Opt OUT instead of Opt IN for this is CRAZY in my opinion. I hope that the reddit post is WRONG on that detail but I seriously doubt it.

I asked Claude: "If a company has a privacy policy and says they will not train on your data and then decides to change the policy in order "to make the models better for everyone." What should the terms be?"

The model suggests in the first paragraph or so EXPLICIT OPT IN. Not Opt OUT

126. drdaeman ◴[29 Aug 25 17:46 UTC] No.45067217{4}[source]▶

>>45067040 #

Oh! Thank you!

That popup was confusing as hell then, because I've read and understood it as two separate points: I've got it that they're making training opt-out, and that they're changing data retention to 5 years, independent of each other. I got upset over this, and haven't really researched into the nuances - and turns out I've got it all wrong.

Appreciate your comment, it's really helpful!

I hope they change the language to make it clear 5 years only applies to the chats they're allowed to train models on.

(Weirdly, I can't find the word "years" anywhere on their Privacy Policy, and the only instance on the Consumer Terms of Service pages is about being of legal age over 18 years old.)

127. ljosifov ◴[29 Aug 25 17:46 UTC] No.45067219{3}[source]▶

>>45066501 #

Maybe in the US. In the UK, I have found obstacles to data sharing codified in the UK law frustrating. I'm reasonably sure some people will have died because of this, that would not have died otherwise. "Otherwise" case being - if they could communicate with the NHS, similarly (via email, whatsapp) to how they communicate in their private and professional lives.

Within the UK NHS and UK private hospital care, these are my personal experiences.

1) Can't email my GP to pass information back-and-forth. GP withholds their email contact, I can't email them e.g. pictures of scans, or lab work reports. In theory they should have those already on their side. In practice they rarely do. The exchange of information goes sms->web link->web form->submit - for one single turn. There will be multiple turns. Most people just give up.

2) MRI scan private hospital made me jump 10 hops before sending me link, so I can download my MRI scans videos and pictures. Most people would have given up. There were several forks in the process where in retrospect could have delayed data DL even more.

3) Blood tests scheduling can't tell me back that scheduled blood test for a date failed. Apparently it's between too much to impossible for them to have my email address on record, and email me back that the test was scheduled, or the scheduling failed. And that I should re-run the process.

4) I would like to volunteer my data to benefit R&D in the NHS. I'm a user of medicinal services. I'm cognisant that all those are helping, but the process of establishing them relied on people unknown to me sharing very sensitive personal information. If it wasn't for those unknown to me people, I would be way worse off. I'd like to do the same, and be able to tell UK NHS "here are, my lab works reports, 100 GB of my DNA paid for by myself, my medical histories - take them all in, use them as you please."

In all cases vague mutterings of "data protection... GDPR..." have been relayed back as "reasons". I take it's mostly B/S. Yes there are obstacles, but the staff could work around if they wanted to. However there is a kernel of truth - it's easier for them to not try to share, it's less work and less risk, so the laws are used as a cover leaf. (in the worst case - an alibi for laziness.)

128. carlhjerpe ◴[29 Aug 25 17:56 UTC] No.45067344[source]▶

>>45062738 (OP) #

Thanks for informing me, I've opted out.

Generally I upvote chats which gives my chat to anthropic when I feel like sharing, I'll keep doing that like before with this opted out.

129. locallost ◴[29 Aug 25 17:58 UTC] No.45067360{4}[source]▶

>>45067043 #

No, nbulka is correct. People should not shrug off and accept things that are wrong just because it's to be expected. It's one of the worst things you can do because as already pointed out, it just normalizes wrong.

130. szczepano ◴[29 Aug 25 18:18 UTC] No.45067604{3}[source]▶

>>45066970 #

You can and should safely assume people can do anything that's possible to do. Weather something is bad or good is a term of historical debate.

replies(1): >>45076698 #

131. stkdump ◴[29 Aug 25 18:29 UTC] No.45067732{5}[source]▶

>>45065671 #

The meaning of the term "opt-in" is that it is off by default and has to be manually enabled. "opt-out" means it is on by default and you have to manually turn it off. "opt-in-by-default" or "opted in by default" are needlessly confusing.

replies(1): >>45068391 #

132. simonw ◴[29 Aug 25 18:36 UTC] No.45067797{4}[source]▶

>>45066125 #

I don't think that logically makes sense.

Training on everything you can publicly scrape from the internet is a very different thing from training on data that your users submit directly to your service.

replies(2): >>45069962 #>>45070009 #

133. simonw ◴[29 Aug 25 18:37 UTC] No.45067803{4}[source]▶

>>45065932 #

Why can't companies be trusted to follow the settings?

If they add those settings why would you expect they wouldn't respect them? Do you think they're purely cosmetic features that don't actually do anything?

replies(3): >>45070033 #>>45070257 #>>45080866 #

134. pigeons ◴[29 Aug 25 18:39 UTC] No.45067827{5}[source]▶

>>45066511 #

But they didn't only train on information the creators made freely available. They trained on copyrighted materials obtained illicitly.

replies(1): >>45071073 #

135. elitan ◴[29 Aug 25 18:47 UTC] No.45067914[source]▶

>>45062738 (OP) #

i'm fine with it. happy to seed.

136. some_random ◴[29 Aug 25 18:54 UTC] No.45067979{3}[source]▶

>>45066879 #

There's a huge popup the first time you log in now

137. vunderba ◴[29 Aug 25 18:59 UTC] No.45068034{5}[source]▶

>>45066511 #

> "Having machines consume large volumes of data posted on the Internet for the purpose of generating value for them without compensating the creators" is equally a description of Google.

Quid pro quo. Those sites also received traffic from the audiences searching using Google. "Without compensation" really only became a thing when Google started adding the inlined cards which distilled the site's content thus obviating the need for a user to visit the aforementioned site.

replies(2): >>45068426 #>>45074159 #

138. sobkas ◴[29 Aug 25 19:04 UTC] No.45068085{5}[source]▶

>>45066511 #

Proper term for it is Computer Assisted Plagiarism, CAP for short. Also, I really hope that Google doesn't claim it created sites it crawl for search their engine.

139. godshatter ◴[29 Aug 25 19:04 UTC] No.45068090{5}[source]▶

>>45065451 #

Use gpt4all or one of the other locally-hosted AI chatbots. Download a model and see if it works for you. It won't be as good as the latest models out there presumably, but at least you're not sending any chat data anywhere.

replies(1): >>45068233 #

140. exe34 ◴[29 Aug 25 19:05 UTC] No.45068095{3}[source]▶

>>45066376 #

so I can take a screenshot from a movie trailer on YouTube and sell posters of it now? I thought copyright still applied to the poor.

141. andreagrandi ◴[29 Aug 25 19:09 UTC] No.45068144[source]▶

>>45062738 (OP) #

You can opt out. It’s written quit quite clearly

replies(1): >>45070108 #

142. UltraSane ◴[29 Aug 25 19:15 UTC] No.45068206{3}[source]▶

>>45065912 #

Because the demand for training data is insatiable and they already are using basically everything available and they need more human generated data and chats with their own LLM is a perfect source.

143. dudefeliciano ◴[29 Aug 25 19:17 UTC] No.45068233{6}[source]▶

>>45068090 #

I'm really disheartened by all this shrugging of shoulders for a company mining user data. On hackernews... Anthropic initally positioned itself as a company with privacy and data protection as a priority, and had a real chance to claim moral high ground compared to its competitors.

replies(1): >>45069338 #

144. bdamm ◴[29 Aug 25 19:21 UTC] No.45068274{6}[source]▶

>>45066668 #

Google has been regurgitating content right into search results since the very beginning, and they've been providing "synopsis" type of results for over a decade.

145. kukkeliskuu ◴[29 Aug 25 19:28 UTC] No.45068341{3}[source]▶

>>45065825 #

Let's say that I use LLM to develop novel software called X. Then my work is used to train the model. Then somebody uses the model to recreate a copy of the of the software X by prompt "create software that works just like X". My novel software is no longer unique. So how come I have not been deprived of anything?

replies(1): >>45078416 #

146. godelski ◴[29 Aug 25 19:30 UTC] No.45068365{5}[source]▶

>>45066511 #

  > where it can freely be downloaded by anyone at any time, by people who are then free to do whatever they like with it on their own hardware

I think you have a strong misunderstanding of the law and the general expectation of others.

I'd like to remind you that a lot of celebrities face legal issues for posting photos of themselves. Here's a recent example with Jennifer Lopez[0]. The reason these types of lawsuits are successful is because it is theft of labor. If you hire a professional photographer to take photos of your wedding then the contract is that the photographer is handing over ownership of the photos in exchange of payment. The only difference here is that the photo was taken before a contract was made. The celebrity owns the right to their body and image, but not to the photograph.

Or think about Open Source Software. Just because it is posted on GitHub does not mean you are legally allowed to use it indiscriminately. GitHub has licenses and not all of them are unrestricted. In fact, a repo without a license does not mean unfettered usage. The default is that the repo owner has the copyright[1].

  > You're under no obligation to choose a license. However, without a license, the default copyright laws apply, meaning that you retain all rights to your source code and no one may reproduce, distribute, or create derivative works from your work.

A big part of what will make a lawsuit successful or not is if the owner has been deprived of compensation. As in, if you make money off of someone else's work. That's why this has been the key issue in all these AI lawsuits. Where the question is about if the work is transformative or not. All of this is in new legal territory because the laws were not written with this usage in mind. The transformative stuff is because you need to allow for parody or referencing. You don't want a situation where, say... someone including a video of what the president has said to discuss what was said[2]. But this situation is much closer to "Joe stole a book, learned from that book, and made a lot of money through the knowledge that they obtained from this book AND would not have been able to do without the book's help." Just, it's usually easier to go after the theft part of that situation. It's definitely a messy space.

But basically, just because a piece of art exists on public property does not mean you have the right to do whatever you want with it.

  >  is equally a description of Google.

Yes and no. The AI summaries? Yeah. The search engine and linking? No. The latter is a mutually beneficial service. It's one thing to own a taxi service and it is another to offer a taxi service that will walk into a starbucks take a random drink off the counter and deliver it to you. I'm not sure why this is difficult to understand.

[0] https://www.bbc.com/news/articles/cx2qqew643go

[1] https://docs.github.com/en/repositories/managing-your-reposi...

[2] https://www.youtube.com/watch?v=tUnRWh4xOCY

147. victorbjorklund ◴[29 Aug 25 19:31 UTC] No.45068370[source]▶

>>45063367 #

Doubt they feed everything in. They probably pick out a small subset of conversations for the training round.

148. rkomorn ◴[29 Aug 25 19:33 UTC] No.45068391{6}[source]▶

>>45067732 #

True, yes. Totally agree with you on the fundamental definition of opt-in vs opt-out.

You can also have a checkbox that says "I consent to having my data used for training", which would look like "opting in", and it could be true by default.

Or you can have a checkbox that says "Leave my data out of your training set", which would look like "opting out", and which could be unchecked default.

Technically, they're both "opt-out", but I've seen enough examples (intentionally confusing and arguably "dark patterns") that I personally don't really consider "it's opt-in" to be a complete statement anymore.

Edit: I'll add that, in the comment I was replying to, it very much looked like you had to go to a settings page in order to opt-out, which I think is entirely reasonably described as having been opted-in by default. Here's what they had written:

> All you have to do is flip a single switch in the options to turn it off

And I actually think "opted-in by default" is valid and calls out cases where it looks like you consent, but that decision was made for you. Although in this case I think I've seen other comments that describe the UX differently, but my comment was more of a general comment than about this particular flow.

149. victorbjorklund ◴[29 Aug 25 19:34 UTC] No.45068416{5}[source]▶

>>45065468 #

That was never the case. You paid money for food in the store? Tracked and data mined. You pay for spotify? Tracked and data mined. Only thing not tracked are things where it is impossible/worthless to track it

replies(1): >>45068498 #

150. godelski ◴[29 Aug 25 19:35 UTC] No.45068426{6}[source]▶

>>45068034 #

I'm not sure quid pro quo even matters. A search engine is more like providing a taxi service. You're just taking people to a place.

Now the AI summaries are a different story. One where there is no quid pro quo either. It's different when that taxi service will also offer the same service as that business. It's VERY different when that taxi service will walk into that business, take their services free of charge[0], and then transfer that to the taxi customer.

[0] Scraping isn't going to offer ad revenues

[Side note] In our analogy the little text below the link it more like the taxi service offering some advertising or some description of the business. Bit more gray here but I think the quid pro quo phrase applies here. Taxi does this to help customer find the right place to go, providing the business more customers. But the taxi isn't (usually) replacing the service itself.

151. godelski ◴[29 Aug 25 19:40 UTC] No.45068489{7}[source]▶

>>45067125 #

  > on their own hardware

That doesn't make it technically legal. That only makes it not worth pursuing. You can sue Joe Schmoe for a million dollars but if he doesn't have that then you're not getting a dime. But if Joe Schmoe is using that thing to make money, well then... yeah you bet your ass that's a different situation and the "worth" of pursuing is directly proportional to how much he is making. Doesn't matter if it is his own hardware or not.

Like why do you think who owns the hardware even matters? Do you really think the legality changes if I rent a GPU vs use my own? That doesn't make any sense.

replies(1): >>45070581 #

152. dudefeliciano ◴[29 Aug 25 19:41 UTC] No.45068498{6}[source]▶

>>45068416 #

i guess it's super cool then shrug. There is also a difference between tracking the music you listen to and even the food you buy, and "conversations" you have that may include much more private and sensitive topics.

153. senko ◴[29 Aug 25 19:41 UTC] No.45068503{4}[source]▶

>>45066424 #

I consumed large volumes of data posted on the internet for decades, which generated a lot of value for me, without compensating the creators.

The only difference is that I (presumably) have a soul.

154. ◴[29 Aug 25 20:18 UTC] No.45068919[source]▶

>>45062738 (OP) #

155. nickpsecurity ◴[29 Aug 25 20:24 UTC] No.45068970[source]▶

>>45065446 #

Yours is the sanest interpretation of this.

156. themafia ◴[29 Aug 25 20:24 UTC] No.45068974{3}[source]▶

>>45066376 #

Faithfully reproducing something you've previously read while passing it off as your own original work is a violation of the most basic tenets of intellectual property rights.

157. themafia ◴[29 Aug 25 20:26 UTC] No.45068998{3}[source]▶

>>45065912 #

> I check when I start using any new service.

So your assumption is that the reported privacy policy of any company is completely accurate. There there is no means for the company to violate this policy and that once violated you will immediately be notified.

> It only takes a moment to go into settings -> privacy and look.

It only takes a moment to examine history and observe why this is wholly inadequate.

158. tagawa ◴[29 Aug 25 20:28 UTC] No.45069010{3}[source]▶

>>45066953 #

Not the OP but…

https://www.jdsupra.com/legalnews/healthline-media-agrees-to...

"Healthline.com provided an opt-out mechanism, but it was misconfigured and Healthline failed to test it, resulting in data being shared with third parties even after consumers elected to opt out.”

https://www.bbc.com/news/technology-65772154

"The company agreed to pay the US Federal Trade Commission (FTC) after it was accused of failing to delete Alexa recordings at the request of parents.”

https://www.mediapost.com/publications/article/405635/califo...

"According to the [California Privacy Protection] agency, Todd Snyder told website visitors they could opt out of data sharing, but didn't actually allow them to do so for 40 days in late 2023 because its opt-out mechanism was improperly configured."

159. kristopolous ◴[29 Aug 25 20:38 UTC] No.45069132[source]▶

>>45062738 (OP) #

This needs to be a two way street ... one where we have programs that feed in piles of constant garbage unless they pay us for the privilege of clean data.

When things are free (in this case your input), they will get abused. Put a cost on it.

160. WA ◴[29 Aug 25 20:42 UTC] No.45069163{3}[source]▶

>>45066376 #

Forgot the 82TB of torrented books Meta has been using for training? I mean, yeah, it’s Meta. No surprise. But I won’t believe for one second that the other players didn’t do a similar thing. They just haven’t been caught yet.

161. renewiltord ◴[29 Aug 25 20:45 UTC] No.45069198{3}[source]▶

>>45064069 #

You have it backwards. I'm paying $200/month and so I want the thing to reflect more what I want than what the general public wants. They'd better be mining my data, despite that being a term I haven't heard in over a decade.

Some people were upset that Google Maps would just take the data that contributors give it for free. My problem was different. I use Google Maps and I want a way to correct it. I don't want to be paid for this. I want the tool I'm using to be correctable by me. The more I pay for it, the more I want it to be editable by me. I don't want compensation. I want it to be better. And I can make it better.

It's sort of why we picked Kong at a different company. Open source core meant that we could edit stuff we didn't like. In fact, considering that we paid, we wanted them to upstream what we changed.

replies(1): >>45070883 #

162. godshatter ◴[29 Aug 25 21:01 UTC] No.45069338{7}[source]▶

>>45068233 #

I'm not shrugging my shoulders about this. I already don't use online LLMs for anything important because of privacy concerns already (i.e. I assume anything that lands on their server is fair game for them to do what they like with it). If you do have privacy concerns about your data, host it locally. I have played around with self-hosted LLMs and they work for what I need, which is mostly playing around with them for entertainment.

I'm careful about what data of mine lands on someone else's server, so I'm not a fan of this even without the dark patterns.

163. ◴[29 Aug 25 21:04 UTC] No.45069363{3}[source]▶

>>45066376 #

164. Sohcahtoa82 ◴[29 Aug 25 21:23 UTC] No.45069550{3}[source]▶

>>45066376 #

This is a quintessential bad faith comment.

The reference to terabytes of stolen data refers to copyrighted material. I think you know this but chose to frame it as "stuff freely posted on the internet" in order to mislead and strawman the other comment.

replies(1): >>45069748 #

165. marssaxman ◴[29 Aug 25 21:43 UTC] No.45069748{4}[source]▶

>>45069550 #

I meant it exactly as I said it. I do not agree that any theft occurred, either in law or in spirit, and I believe that reinterpretation of intellectual-property law in order to make it a crime would cause significant harm, greatly outweighing the benefits, as has been the case with every other expansion of intellectual property law I have seen.

replies(1): >>45069977 #

166. uncletscollie ◴[29 Aug 25 21:45 UTC] No.45069767{5}[source]▶

>>45066511 #

That is not at all how the internet works. Try to download music from Napster and Lars will sue your ass.

replies(1): >>45070840 #

167. vb-8448 ◴[29 Aug 25 22:04 UTC] No.45069944{3}[source]▶

>>45064810 #

In general, I think any human generated content in pre-2022 is valuable because someone did some kind of validation (think about stack overflow answer with user confirming that a specific answer fixed their problem).

If they start to feed the next model with LLM generated crap, the overall performance will drop and instead of getting a useful answer 1 of 5 it will be 1 of 10(?) and probably a lot of us will cancel the subscription ... so in the end I think it matters.

168. rpgbr ◴[29 Aug 25 22:07 UTC] No.45069962{5}[source]▶

>>45067797 #

>Training on everything you can publicly scrape from the internet is a very different thing from training on data that your users submit directly to your service.

Yes. It's way easier and cheaper when the data comes to you instead of having to scrape everything elsewhere.

169. fcarraldo ◴[29 Aug 25 22:08 UTC] No.45069977{5}[source]▶

>>45069748 #

Anthropic downloaded books from Library Genesis and The Pirate Library mirror. This is factual and reported on from court documents.

What’s the angle that describes this as fair use?

[0] https://www.businessinsider.com/anthropic-cut-pirated-millio...

replies(1): >>45070728 #

170. fcarraldo ◴[29 Aug 25 22:11 UTC] No.45070009{5}[source]▶

>>45067797 #

OpenAI, Meta and X all train from user submitted data, in Meta and X’s case data that had been submitted long before the advent of LLMs.

It’s not a leap to assume Anthropic does the same.

replies(1): >>45072303 #

171. fcarraldo ◴[29 Aug 25 22:14 UTC] No.45070033{5}[source]▶

>>45067803 #

Because they can’t be?

https://www.reuters.com/sustainability/boards-policy-regulat...

https://www.bbc.com/news/articles/cx2jmledvr3o

replies(1): >>45071304 #

172. distances ◴[29 Aug 25 22:27 UTC] No.45070108[source]▶

>>45068144 #

I don't think that retention part was clear at all. It was separate from the opt-out. I assume I'm now opted out but that they'll keep the data for five years anyway.

replies(1): >>45070785 #

173. stevenhuang ◴[29 Aug 25 22:30 UTC] No.45070135[source]▶

>>45062904 #

... and your assumptions were wrong until now. Not sure that's much of a dunk as you think it seems

174. sjapkee ◴[29 Aug 25 22:41 UTC] No.45070223{3}[source]▶

>>45065912 #

Bro really thinks privacy settings work

175. fcarraldo ◴[29 Aug 25 22:45 UTC] No.45070257{5}[source]▶

>>45067803 #

Also currently being discussed[0], on this very site, is both speculation that Meta is surreptitiously scanning your camera roll and a comment claiming that they worked on an earlier implementation to do just that.

It’s shocking to me that anyone who works in our industry would trust any company to do as they claim.

[0] https://news.ycombinator.com/item?id=45062910

176. Scribbd ◴[29 Aug 25 22:53 UTC] No.45070310[source]▶

>>45062738 (OP) #

I know this is probably not related, but this is right after the release of the AI safety index that praised Anthropic for not using user data… And here I was considering them because they did so much better on that test.

https://futureoflife.org/ai-safety-index-summer-2025/

177. marssaxman ◴[29 Aug 25 23:35 UTC] No.45070581{8}[source]▶

>>45068489 #

In terms of copyright law, it matters very much whether Joe Schmoe is using his own copy of the data for his own purposes, or whether he is making more copies and distributing them to other people.

If the AI companies were letting people download copies of their training data, copyright law would certainly have something to say about that. But no: once they download the training data, they keep it, and they don't share it.

replies(1): >>45070606 #

178. godelski ◴[29 Aug 25 23:40 UTC] No.45070606{9}[source]▶

>>45070581 #

  > using his own copy of the data

Yes? That is a different thing? I guess we can keep moving the topic until we're talking about the same topic if you want. But honestly, I don't want to have that kind of conversation.

replies(2): >>45070830 #>>45074056 #

179. djmips ◴[29 Aug 25 23:49 UTC] No.45070674[source]▶

>>45062738 (OP) #

"If you choose not to allow us to use your chats and coding sessions to improve Claude, your chats will be retained in our back-end storage systems for up to 30 days."

180. schwartzworld ◴[29 Aug 25 23:56 UTC] No.45070721{5}[source]▶

>>45066511 #

What if that data isn’t publicly posted? For example, copilot regurgitating code from private repos, complete with comments.

181. marssaxman ◴[29 Aug 25 23:57 UTC] No.45070728{6}[source]▶

>>45069977 #

The simple fact that they are not republishing any of that data. Fair use does not apply, because copyright does not apply, because nothing is being copied.

replies(1): >>45071030 #

182. robwwilliams ◴[30 Aug 25 00:02 UTC] No.45070763[source]▶

>>45066305 #

Agreed. I actually want and need Claude and other LLMs to learn from my interactions with them. Exceedingly frustrating that LLMs do not yey hold a longterm memory or embedded trace of conversations per user. I have been asking Anthropic for this option as an opt-in for 6 months.

Sure, I understand the concerns many if you have.

But in my niche areas of cognitive research, genetics, and neurophilosophy I need Claude to be much smarter than it is now. I am happy to share what I know with Anthropic so that zi eventually have a better companion thinker.

183. robwwilliams ◴[30 Aug 25 00:04 UTC] No.45070773[source]▶

>>45064509 #

It is optional. I want the sharing option.

184. robwwilliams ◴[30 Aug 25 00:07 UTC] No.45070785{3}[source]▶

>>45070108 #

You want them to flush your conversations on your own schedule. That I can understand. If you delete a conversation it should be DELETED.

185. robwwilliams ◴[30 Aug 25 00:10 UTC] No.45070795[source]▶

>>45064209 #

Ditto: was delighted to see this as an option. Am I missing some details? I do not understand the comments in “rug pulls”.

186. marssaxman ◴[30 Aug 25 00:19 UTC] No.45070830{10}[source]▶

>>45070606 #

How is it a different thing? Are we talking about copyright law, or not?

replies(1): >>45076444 #

187. robwwilliams ◴[30 Aug 25 00:21 UTC] No.45070835[source]▶

>>45063574 #

You are right. To the best of my knowledge there has notbeen a way to share conversations with Anthropic, and therefore with future versions of Claudes. There was no Opt-in option as far as I know.

Straighten me out if I am wrong.

I need this opt-in to improve the foundational model that they have trained. It is good, but not good enough.

188. marssaxman ◴[30 Aug 25 00:22 UTC] No.45070840{6}[source]▶

>>45069767 #

No he certainly will not; you will only get sued if you upload Lars' music to share with other people. If you download an illegal copy, the person you downloaded from is the one breaking the law.

replies(1): >>45074858 #

189. robwwilliams ◴[30 Aug 25 00:29 UTC] No.45070883{4}[source]▶

>>45069198 #

Yes, I share your take on this change—a way to improvements in many domains.

Agree that the fact that these improvements will accrue within in a proprietary for profit. But still a net positive fir my work.

Give me a FOSS LLM with Claude 4 Sonnet performance and a 1 million token context and I will work even harder toward improvements in my areas of biological NIH-funded research.

190. robwwilliams ◴[30 Aug 25 00:31 UTC] No.45070900{5}[source]▶

>>45065451 #

Is there a reason you cannot opt out? Or don’t you trust Anthropic’s opt-out implementation.

191. robwwilliams ◴[30 Aug 25 00:38 UTC] No.45070927[source]▶

>>45065345 #

Odd that you are being down-voted for pointing out the easy opt-out option. I need this opt-in feature. I suppose the polarity of action could be a factor.

192. gist ◴[30 Aug 25 00:39 UTC] No.45070930{4}[source]▶

>>45066424 #

> "Reading stuff freely posted on the internet" is also very different from a business having machines consume large volumes of data posted on the Internet for the purpose of generating value for them without compensating the creators.

The fact that value is being created is irrelevant. The fact that they are making profit is irrelevant. As is non compensation to creators. There isn't any law being broken. Is there?

Bottom line in real world terms there is no expectation of privacy with a freely open and unrestricted web site. Even if that website said 'you can use this for single use but not mass use' that in itself is not legally or practically enforceable.

Let's take the example of a Christmas light show. The idea might be (in the homeowners mind) that people, families, will drive by in their cars to enjoy the light show (either a single home or the entire street or most of it). They might think 'we don't want buses full of people who paid to ride the bus' coming down the street. Unfortunately there is no way to prevent that (without the city and laws getting involved) and there is nothing wrong with the fact that the people who provide the bus are making money bringing people to see the light show.

193. Wowfunhappy ◴[30 Aug 25 01:02 UTC] No.45071030{7}[source]▶

>>45070728 #

So you don't think downloading something from The Pirate Bay constitutes copyright infringement provided you don't republish it?

replies(1): >>45071319 #

194. jMyles ◴[30 Aug 25 01:08 UTC] No.45071058{4}[source]▶

>>45066424 #

> "Reading stuff freely posted on the internet" is also very different from a business having machines consume large volumes of data

...not if you believe in the right of general-purpose computing. If they have the right to read the data, why don't they have a right to program a computer to do it for them?

I think we all agree that they're not the good guys here, but this reasoning in particular is troubling.

195. pigeons ◴[30 Aug 25 01:11 UTC] No.45071073{6}[source]▶

>>45067827 #

I know we're not supposed to comment about downvotes, but the original comment was talking about "these companies", and none of the information indicating that they, or at the very least Meta, trained on terabytes of books downloaded from zlib and libgen and other torrent sites, is in dispute. So even if you believe that copyright should not exist, I don't see why this is not a valid dispute of the parents argument that they only trained on information creators made freely available.

196. simonw ◴[30 Aug 25 02:04 UTC] No.45071304{6}[source]▶

>>45070033 #

There is an enormous gap between the behavior covered in those two cases and training machine learning models on user data that a company has specifically said it will not use for training.

197. marssaxman ◴[30 Aug 25 02:08 UTC] No.45071319{8}[source]▶

>>45071030 #

Precisely. The person sharing is the one breaking the law.

replies(3): >>45071762 #>>45073626 #>>45074880 #

198. TheRoque ◴[30 Aug 25 03:48 UTC] No.45071762{9}[source]▶

>>45071319 #

That's factually wrong, downloading without sharing is also illegal.

199. atkailash ◴[30 Aug 25 04:10 UTC] No.45071856[source]▶

>>45066475 #

I’ve found Gemini argumentative and maybe condescending too.

Mistral feels like a good balance between haughtiness and sycophantism.

replies(1): >>45073082 #

200. t0mas88 ◴[30 Aug 25 04:54 UTC] No.45071997[source]▶

>>45066464 #

It only applies to new conversations, and they show a popup with this info in the app. Probably as a result of the legal team considering the options you listed.

replies(1): >>45074391 #

201. carbonbioxide ◴[30 Aug 25 04:55 UTC] No.45071999[source]▶

>>45062738 (OP) #

If you're plan to keep improving LLM's is based on more training, then we have an AI hype problem.

202. nerdponx ◴[30 Aug 25 04:55 UTC] No.45072004{5}[source]▶

>>45066511 #

It's not about the downloading of the data, it's about its use in training models, which is dubious from a copyright perspective.

203. adastra22 ◴[30 Aug 25 06:12 UTC] No.45072303{6}[source]▶

>>45070009 #

By X do you mean tweets? Can you not see how different that is from training on your private conversations with an LLM?

What if you ask it for medical advice, or legal things? What if you turn on Gmail integration? Should I now be able to generate your conversations with the right prompt?

replies(1): >>45085938 #

204. adastra22 ◴[30 Aug 25 06:13 UTC] No.45072310{3}[source]▶

>>45066931 #

That’s impossible. You can’t anonymize data at scale.

205. awalsh128 ◴[30 Aug 25 06:24 UTC] No.45072358[source]▶

>>45062738 (OP) #

Wow, a 5 year retention. That seems so arbitrary and excessive. As someone who has been involved with privacy compliance, this is crazy. Does this apply for European users too? I don't know what the business justification for this would be but I am sure some lawyer could explain this better.

206. aatd86 ◴[30 Aug 25 08:54 UTC] No.45073082{3}[source]▶

>>45071856 #

You're right. Was happening to me as well last week. Especially with the flash version. I am trying the Pro version and it seems to be better. But you are absolutely correct, I've had similar experience. What I have experienced with Claude was just another level though.

207. ljosifov ◴[30 Aug 25 09:15 UTC] No.45073184{3}[source]▶

>>45065043 #

I am a little bit worried, for sure. But I think that's small extra risk on my side, for small extra gain for me personally, but large extra gain for the wider group I belong to (ultimately - all of humanity) in the sense of working towards ameliorating the "tragedy of the commons".

On the personal side. Given the LLM-s have not got the ground truth, everything is controlled hallucination, then - if the LLM tells you an imperfect version of my email or chat, you can never be sure if what the LLM told you is true, or not. So maybe you don't gain that much extra knowledge about me. For example, you can reasonably guess I'm typing this on the computer, and having coffee too. So if you ask the LLM "tell me a trivial story", and LLM comes back with "one morning, LJ was typing HN replies on the computer while having his morning coffee" - did you learn that much new about me, that you didn't know or could guess before?

On the "tragedy of the commons" side. We all benefit immensely from other people sharing their data, even very personal data. Any drug discovery, testing, approval - relies on many people allowing their data to be shared. Wider context - living in a group of people, involves radiating data outwards, and using data other people emit towards myself (and others), to have a functioning society. The more advanced the society, the more coordination it needs to achieve the right cooperation-competition balance in the interactions between ever greater numbers of people.

I think it's bad for me personally, and for everyone, that the "data privacy maximalists" had their desires codified in UK laws. My personal experience in the UK medical systems has been that the laws made my life worse, not better. Wrote here https://news.ycombinator.com/item?id=45066321

208. thrwaway55 ◴[30 Aug 25 10:52 UTC] No.45073608{5}[source]▶

>>45066511 #

Ok so if I publish under a license saying I don't allow for it to be used for AI do you believe they respect it? What word would you use to describe this violation? Go ahead throw up a robots.txt, throw up a license. You will be able to coax the "fair use" stochastic parrots to render it verbatim.

Sam Altman and his ilk are exploiting the incredibly slow moving legal system to enrich themselves.

209. thrwaway55 ◴[30 Aug 25 10:58 UTC] No.45073626{9}[source]▶

>>45071319 #

I just want to confirm this, you believe that when OpenAI and their agents post copyright material that they did not pay for verbatim it is breaking the law?

210. beacon473 ◴[30 Aug 25 11:26 UTC] No.45073749[source]▶

>>45066475 #

If you don't like Claude's personallity, ask him to behave differently. It's common for me to periodically say 'don't be so sycophantic' and 'be more critical' when working on technical projects.

In your case, try saying 'be nicer' or 'be more jovial'.

replies(2): >>45092438 #>>45092454 #

211. derangedHorse ◴[30 Aug 25 12:30 UTC] No.45074056{10}[source]▶

>>45070606 #

It doesn’t seem like anyone is moving topics here. Where do you see the topic being moved?

replies(1): >>45079498 #

212. derangedHorse ◴[30 Aug 25 12:43 UTC] No.45074159{6}[source]▶

>>45068034 #

Arguments like this never work out. There is no agreed upon compensation for being listed. If I didn’t want my site listed by Google and it was listed anyway, I may not think the traffic justifies my subjective “cost” of being listed. There’s also no legal protection against having my publicly accessible site and the title in its html from being shown (as there shouldn’t be).

213. nbulka ◴[30 Aug 25 13:17 UTC] No.45074391{3}[source]▶

>>45071997 #

You mean this popup?

https://imgur.com/afqMi0Z

214. coldtea ◴[30 Aug 25 14:10 UTC] No.45074841{3}[source]▶

>>45066376 #

"Reading stuff freely posted on the internet" that has copyrights to be used in your generative AI service is stealing, is a pretty basic interpretation of property rights.

215. coldtea ◴[30 Aug 25 14:13 UTC] No.45074858{7}[source]▶

>>45070840 #

You're breaking the law too - just like accepting stolen goods is also breaking the law, not just selling them.

216. coldtea ◴[30 Aug 25 14:16 UTC] No.45074880{9}[source]▶

>>45071319 #

You are wrong then. Confidently wrong.

U.S.: Downloading = infringement. If prosecuted, usually gets civil lawsuits/fines, not jail.

E.U.: Same — both downloading/hosting illegal, but hosts get cracked down harder.

217. godelski ◴[30 Aug 25 17:30 UTC] No.45076444{11}[source]▶

>>45070830 #

Before you were talking about data you don't own on hardware you do. Now you're talking about data you do own.

The whole thing is about who owns the data!

replies(1): >>45080495 #

218. estimator7292 ◴[30 Aug 25 17:57 UTC] No.45076689{3}[source]▶

>>45066376 #

As long as people are being prosecuted for piracy or having their livelihoods compromised for including a 16 second clip of a song, yes.

219. whattheheckheck ◴[30 Aug 25 17:57 UTC] No.45076698{4}[source]▶

>>45067604 #

Jack Sparrow was right

220. tonyedgecombe ◴[30 Aug 25 19:36 UTC] No.45077378[source]▶

>>45065446 #

They stole all that data on the internet yet it’s still not enough and now they want everything on your local drive as well.

221. timsh ◴[30 Aug 25 21:21 UTC] No.45078120[source]▶

>>45062738 (OP) #

Im all in for the ai hate, but this kind of hysteria on HN is devaluing all the serious discussions about AI safety, skepticism and regulation.

They literally show you a full-page popup with clear text snd OPT IN toggle. It doesn’t seem really shady to me (or worth 10 separate posts on HN).

That said, if this popup doesn’t appear when you sign up after 28th, that would be a dark pattern and shady stuff. For now it’s just clickbait

222. tiahura ◴[30 Aug 25 22:09 UTC] No.45078416{4}[source]▶

>>45068341 #

You still have what you created, X.

223. godelski ◴[31 Aug 25 01:24 UTC] No.45079498{11}[source]▶

>>45074056 #

"His own hardware" != "his own copy of the data"

My entire comment was that the entire issue is about data ownership. Doesn't even matter if you have a copy of the data.

It matters how that copy was obtained.

There's no reason to then discuss if your usage violates the terms of a license if you obtained the data illegally. You're already in the illegal territory lol.

Having data != legally having obtained data

224. marssaxman ◴[31 Aug 25 04:58 UTC] No.45080495{12}[source]▶

>>45076444 #

I can imagine that it would indeed be confusing if you failed to distinguish between ownership of the data and ownership of the copyright.

replies(1): >>45080869 #

225. AlecSchueler ◴[31 Aug 25 06:32 UTC] No.45080866{5}[source]▶

>>45067803 #

Have you really never heard of companies saying one thing while doing another?

replies(1): >>45081600 #

226. godelski ◴[31 Aug 25 06:33 UTC] No.45080869{13}[source]▶

>>45080495 #

Sure... now go back to your edgy comment and keep this in mind to see why everyone is arguing with you

  >>...> To be honest, these companies already stole terabytes of data and don't even disclose their dataset, so you have to assume they'll steal and train at anything you throw at them

   >...> "Reading stuff freely posted on the internet" constitutes stealing now?

Literally everyone was talking about data ownership and you just said "I can download it, so it is fair game on my hardware." Let's say you didn't intend to say that. Well that doesn't matter, that's what a lot of people heard and you failed to clarify when pressed on this.

So yeah, I think you're doing gymnastics

https://news.ycombinator.com/item?id=45066376

227. simonw ◴[31 Aug 25 08:53 UTC] No.45081600{6}[source]▶

>>45080866 #

Yes, normally when they lose a lawsuit over it.

228. fcarraldo ◴[31 Aug 25 18:56 UTC] No.45085938{7}[source]▶

>>45072303 #

I don't think AI companies should be doing this, but they are doing it. All are opt-out, not opt-in. Anthropic is just changing their policies to be the same as their competition.

xAI trains Grok on both public data (Tweets) and non-public data (Conversations with Grok) by default. [0]

> Grok.com Data Controls for Training Grok: For the Grok.com website, you can go to Settings, Data, and then “Improve the Model” to select whether your content is used for model training.

Meta trains its AI on things posted to Meta's products, which are not as "public" as Tweets on X, because users expect these to be shared only with their networks. They do not use DMs, but they do use posts to Instagram/Facebook/etc. [1]

> We use information that is publicly available online and licensed information. We also use information shared on Meta Products. This information could be things like posts or photos and their captions. We do not use the content of your private messages with friends and family to train our AIs unless you or someone in the chat chooses to share those messages with our AIs.

OpenAI uses conversations for training data by default [2]

> When you use our services for individuals such as ChatGPT, Codex, and Sora, we may use your content to train our models.

> You can opt out of training through our privacy portal by clicking on “do not train on my content.” To turn off training for your ChatGPT conversations and Codex tasks, follow the instructions in our Data Controls FAQ. Once you opt out, new conversations will not be used to train our models.

[0] https://x.ai/legal/faq

[1] https://www.facebook.com/privacy/genai/

[2] https://help.openai.com/en/articles/5722486-how-your-data-is...

229. ◴[01 Sep 25 13:09 UTC] No.45092438{3}[source]▶

>>45073749 #

230. aatd86 ◴[01 Sep 25 13:12 UTC] No.45092454{3}[source]▶

>>45073749 #

I had tried and it started being defensive about this too, haha. Hence my flip-the-table reaction, deciding to just cancel my subscription. Maybe I will come around later.

↑