The question is: how does that affect their choices. How much ends up being gated what previously would have ended up in the open?
Me: I am using a local variant ( and attempting to build something I think I can control better ).
Did they rephrase the question? Probably the first answer was wrong. Did the session end? Good chance the answer was acceptable. Did they ask follow-ups? What kind? Etc.
Or that the user just ragequit
I know once you delete something on Discord its poof, and that's the end of that. I've reported things that if anyone at Discord could access a copy of they would have called police. There's a lot of awful trolls on chat platforms that post awful things.
- Google: active storage for "around 2 months from the time of deletion" and in backups "for up to 6 months": https://policies.google.com/technologies/retention?hl=en-US
- Meta: 90 days: https://www.meta.com/help/quest/609965707113909/
- Apple/iCloud: 30 days: https://support.apple.com/guide/icloud/delete-files-mm3b7fcd...
- Microsoft: 30-180 days: https://learn.microsoft.com/en-us/compliance/assurance/assur...
So if it ends up that they are storing data longer there can be consequences (GDPR, CCPA, FTC).
That's not what Discord themselves say, is that coming from Discord, the police or someone else?
> Once you delete content, it will no longer be available to other users (though it may take some time to clear cached uploads). Deleted content will also be deleted from Discord’s systems, but we may retain content longer if we have a legal obligation to preserve it as described below. Public posts may also be retained for 180 days to two years for use by Discord as described in our Privacy Policy (for example, to help us train models that proactively detect content that violates our policies). - https://support.discord.com/hc/en-us/articles/5431812448791-...
Seems to be something that decides if the content should be deleted faster, or kept for between 180 days - 2 years. So even for Discord, "once you delete something on Discord its poof" isn't 100% accurate.
I wonder on how much they can rely on the data and what kind of "knowledge" they can extract. I never give feedback and most time (let's say 5 out of 6) the result cc produce it simply wrong. How can they know the result is valuable or not?
Anyway, I’ll block them like I do everything.
It annoys me greatly, that I have no tick box on Google to tell them "go and adapt models I use on my Gmail, Photos, Maps etc." I don't want Google to ever be mistaken where I live - I have told them 100 times already.
This idea that "no one wants to share their data" is just assumed, and permeates everything. Like soft-ball interviews that a popular science communicator did with DeepMind folks working in medicine: every question was prefixed by litany of caveats that were all about 1) assumed aversion of people to sharing their data 2) horrors and disasters that are to befall us should we share the data. I have not suffered any horrors. I'm not aware of any major disasters. I'm aware of major advances in medicine in my lifetime. Ultimately the process does involve controlled data collection and experimentation. Looks a good deal to me tbh. I go out of my way to tick all the NHS boxes too, to "use my data as you see fit". It's an uphill struggle. The defaults are always "deny everything". Tick boxes never go away, there is no master checkbox "use any and all of my data and never ask me again" to tick.
I am sure they will have a coorporate carve out, otherwise it makes them unusuable for some large corps.
The cynic in me wonders if part of Anthropic's decision process here was that, since nobody believes you when you say you're not using their data for training, you may as well do it anyway!
Giving people an opt-out might even increase trust, since people can now at least see an option that they control.
As we’ve seen LLMs be able to fully regenerate text from their sources (or at least close enough), aren’t you the least bit worried about your personal correspondence magically appearing in the wild?
I upgraded after I hit the equivalent spend in API fees in a month.
This is why I love-hate Anthro, the same way I love-hate Apple. The reason is simple: Great product, shitty MBA-fueled managerial decisions.
Edit: I just logged in to opt out, they presented me with the switch directly. It was two clicks.
"ccusage" is telling me I would have spent $2010.90 in the last month if I was paying via the API, rather than $200.
But also I do feel Claude Code is quite a bit better than other things I've used, when using the same model. I'm not sure why though, it's a fairly simple program with only a few prompts and only a few tools, it seems like others could catch up immediately by learning some lessons from it.
But including paid accounts and doing 5 year retention however is confounding.
It’s the reverse. This was opt-in and is now opt-out. Opt means choose so when “the default is opt-in” it means the option is “no” by default and you have the option to make it “yes”.
Feels like the complaint is precisely that people don’t want them to make this change.
> this is exactly how I'd want them to do it.
Sees naive to believe it will always be done like this, especially for new users.
Yes, of course, to both of those. Discord is a for-profit business with limited amount of humans who can focus on things, so the less they can focus on, the better (in the mind of the people running the business at least). So why do anything when you can do nothing, and everything stays the same? Of course when someone has an warrant, they really have to do something, but unless there is, there is no incentive for them to do anything about it.
I check when I start using any new service. The cynical assumption that everything's being shared leads to shrugging it off and making no attempt to look for settings.
It only takes a moment to go into settings -> privacy and look.
They’re assuming that Anthropic that is already receiving and storing your data, is also training their models on that data.
How are you supposed to disprove that as a user?
Also, the whole point is that companies cannot be trusted to follow the settings.
Do you have any reason to think this does anything?
To be clear, i don't use claude for any of those purposes, it's the principle i am talking about.
they gave me a popup to agree to the ToS change, but I can ignore it for a month and still use the product. In the popup, they clearly explained the opt-out switch, which is available in the popup itself as well as in the settings.
Seems like an excessively draconian interpretation of property rights.
So your silence can be used as a warmish signal that you were satisfied. (...or not. Depends on your usage fingerprint.)
Legally, I don't understand how Anthropic's lawyers would have allowed this. Maybe I am just naively optimistic about these matters? I am a Max customer and I might leave! Talk about a "rug pull" ... and I considering moving to an inferior provider! Privacy is a fundamental human right. Please do better, we have not learned our lesson in tech or society because no one is facing any consequences.
Probably that people accused it of being sycophantic and they have tried to adjust it but they didn't do it well. It'd rather criticize and make assumptions about my behavior rather than keeping it technical. Ha!
I prefer gemini. Seems a bit stressed always assuming that I might be frustrated by its answers which is also weird to assume but it is not straight disrespectful at least.
So I am back to testing chatgpt. I keep changing.
> The defaults are always "deny everything".
This is definitely not true for a massive amount of things, I'm unsure how you're even arriving at this conclusion.
It makes no sense to put stuff up on the internet where it can freely be downloaded by anyone at any time, by people who are then free to do whatever they like with it on their own hardware, then complain that people have downloaded that stuff and done what they liked with it on their own hardware.
"Having machines consume large volumes of data posted on the Internet for the purpose of generating value for them without compensating the creators" is equally a description of Google.
I don't disagree regarding Google, I also think they exploited others IP for their own gain. It was once symbiotic with webmasters, but when that stopped they broke that implied good faith contract. In a sense, their snippets and widgets using others IP and no longer providing traffic to the site was the warning shot for where we are now. We should have been modernising IP laws back then.
Personally, I don't mind training, as long as I have a say on the matter - and they have a switch for this. Opt-out is not exactly cool, but I've got the popup in my face, almost a month before the changes, and that's respectful enough for me.
This said, I've just canceled my subscription because this new 5-year mandatory data retention is a deal breaker for me. I don't mind 30 or 60 days or even 90 days - I can understand the need to briefly persist the data. But for anything long-term (and 5 years is effectively permanent) I want to be respected with having a choice, and I'm provided none except for "don't use".
A shame, but fortunately they're not a monopoly.
Which is just to point out that the world wide web is not its own jurisdiction, and I believe AI companies are going to be finding that an ongoing problem. Unlike search, there is no symbiosis here, so there is an incentive to sue. The original IP holders do not benefit in any way. Search was different in that way.
Anthropic PR: "Ma'am, you opted IN to training on your therapy sessions and intellectual property and algorithms and salary and family history!" Don't you remember the modal???
The Modal: https://imgur.com/afqMi0Z
I still expect that our conversations will not leave the premises (ie end up on the internet), because that would be something else, but other than that, I knew what I signed up for.
I trust people until they give me cause to do otherwise.
After seeing the harm done by the expansion of patent law to cover software algorithms, and the relentless abuse done under the DMCA, I am reflexively skeptical of any effort to expand intellectual property concepts.
I asked Claude: "If a company has a privacy policy and says they will not train on your data and then decides to change the policy in order "to make the models better for everyone." What should the terms be?"
The model suggests in the first paragraph or so EXPLICIT OPT IN. Not Opt OUT
That popup was confusing as hell then, because I've read and understood it as two separate points: I've got it that they're making training opt-out, and that they're changing data retention to 5 years, independent of each other. I got upset over this, and haven't really researched into the nuances - and turns out I've got it all wrong.
Appreciate your comment, it's really helpful!
I hope they change the language to make it clear 5 years only applies to the chats they're allowed to train models on.
(Weirdly, I can't find the word "years" anywhere on their Privacy Policy, and the only instance on the Consumer Terms of Service pages is about being of legal age over 18 years old.)
Within the UK NHS and UK private hospital care, these are my personal experiences.
1) Can't email my GP to pass information back-and-forth. GP withholds their email contact, I can't email them e.g. pictures of scans, or lab work reports. In theory they should have those already on their side. In practice they rarely do. The exchange of information goes sms->web link->web form->submit - for one single turn. There will be multiple turns. Most people just give up.
2) MRI scan private hospital made me jump 10 hops before sending me link, so I can download my MRI scans videos and pictures. Most people would have given up. There were several forks in the process where in retrospect could have delayed data DL even more.
3) Blood tests scheduling can't tell me back that scheduled blood test for a date failed. Apparently it's between too much to impossible for them to have my email address on record, and email me back that the test was scheduled, or the scheduling failed. And that I should re-run the process.
4) I would like to volunteer my data to benefit R&D in the NHS. I'm a user of medicinal services. I'm cognisant that all those are helping, but the process of establishing them relied on people unknown to me sharing very sensitive personal information. If it wasn't for those unknown to me people, I would be way worse off. I'd like to do the same, and be able to tell UK NHS "here are, my lab works reports, 100 GB of my DNA paid for by myself, my medical histories - take them all in, use them as you please."
In all cases vague mutterings of "data protection... GDPR..." have been relayed back as "reasons". I take it's mostly B/S. Yes there are obstacles, but the staff could work around if they wanted to. However there is a kernel of truth - it's easier for them to not try to share, it's less work and less risk, so the laws are used as a cover leaf. (in the worst case - an alibi for laziness.)
Generally I upvote chats which gives my chat to anthropic when I feel like sharing, I'll keep doing that like before with this opted out.
Quid pro quo. Those sites also received traffic from the audiences searching using Google. "Without compensation" really only became a thing when Google started adding the inlined cards which distilled the site's content thus obviating the need for a user to visit the aforementioned site.
> where it can freely be downloaded by anyone at any time, by people who are then free to do whatever they like with it on their own hardware
I think you have a strong misunderstanding of the law and the general expectation of others.I'd like to remind you that a lot of celebrities face legal issues for posting photos of themselves. Here's a recent example with Jennifer Lopez[0]. The reason these types of lawsuits are successful is because it is theft of labor. If you hire a professional photographer to take photos of your wedding then the contract is that the photographer is handing over ownership of the photos in exchange of payment. The only difference here is that the photo was taken before a contract was made. The celebrity owns the right to their body and image, but not to the photograph.
Or think about Open Source Software. Just because it is posted on GitHub does not mean you are legally allowed to use it indiscriminately. GitHub has licenses and not all of them are unrestricted. In fact, a repo without a license does not mean unfettered usage. The default is that the repo owner has the copyright[1].
> You're under no obligation to choose a license. However, without a license, the default copyright laws apply, meaning that you retain all rights to your source code and no one may reproduce, distribute, or create derivative works from your work.
A big part of what will make a lawsuit successful or not is if the owner has been deprived of compensation. As in, if you make money off of someone else's work. That's why this has been the key issue in all these AI lawsuits. Where the question is about if the work is transformative or not. All of this is in new legal territory because the laws were not written with this usage in mind. The transformative stuff is because you need to allow for parody or referencing. You don't want a situation where, say... someone including a video of what the president has said to discuss what was said[2]. But this situation is much closer to "Joe stole a book, learned from that book, and made a lot of money through the knowledge that they obtained from this book AND would not have been able to do without the book's help." Just, it's usually easier to go after the theft part of that situation. It's definitely a messy space.But basically, just because a piece of art exists on public property does not mean you have the right to do whatever you want with it.
> is equally a description of Google.
Yes and no. The AI summaries? Yeah. The search engine and linking? No. The latter is a mutually beneficial service. It's one thing to own a taxi service and it is another to offer a taxi service that will walk into a starbucks take a random drink off the counter and deliver it to you. I'm not sure why this is difficult to understand.[0] https://www.bbc.com/news/articles/cx2qqew643go
[1] https://docs.github.com/en/repositories/managing-your-reposi...
You can also have a checkbox that says "I consent to having my data used for training", which would look like "opting in", and it could be true by default.
Or you can have a checkbox that says "Leave my data out of your training set", which would look like "opting out", and which could be unchecked default.
Technically, they're both "opt-out", but I've seen enough examples (intentionally confusing and arguably "dark patterns") that I personally don't really consider "it's opt-in" to be a complete statement anymore.
Edit: I'll add that, in the comment I was replying to, it very much looked like you had to go to a settings page in order to opt-out, which I think is entirely reasonably described as having been opted-in by default. Here's what they had written:
> All you have to do is flip a single switch in the options to turn it off
And I actually think "opted-in by default" is valid and calls out cases where it looks like you consent, but that decision was made for you. Although in this case I think I've seen other comments that describe the UX differently, but my comment was more of a general comment than about this particular flow.
Now the AI summaries are a different story. One where there is no quid pro quo either. It's different when that taxi service will also offer the same service as that business. It's VERY different when that taxi service will walk into that business, take their services free of charge[0], and then transfer that to the taxi customer.
[0] Scraping isn't going to offer ad revenues
[Side note] In our analogy the little text below the link it more like the taxi service offering some advertising or some description of the business. Bit more gray here but I think the quid pro quo phrase applies here. Taxi does this to help customer find the right place to go, providing the business more customers. But the taxi isn't (usually) replacing the service itself.
> on their own hardware
That doesn't make it technically legal. That only makes it not worth pursuing. You can sue Joe Schmoe for a million dollars but if he doesn't have that then you're not getting a dime. But if Joe Schmoe is using that thing to make money, well then... yeah you bet your ass that's a different situation and the "worth" of pursuing is directly proportional to how much he is making. Doesn't matter if it is his own hardware or not.Like why do you think who owns the hardware even matters? Do you really think the legality changes if I rent a GPU vs use my own? That doesn't make any sense.
So your assumption is that the reported privacy policy of any company is completely accurate. There there is no means for the company to violate this policy and that once violated you will immediately be notified.
> It only takes a moment to go into settings -> privacy and look.
It only takes a moment to examine history and observe why this is wholly inadequate.
https://www.jdsupra.com/legalnews/healthline-media-agrees-to...
"Healthline.com provided an opt-out mechanism, but it was misconfigured and Healthline failed to test it, resulting in data being shared with third parties even after consumers elected to opt out.”
https://www.bbc.com/news/technology-65772154
"The company agreed to pay the US Federal Trade Commission (FTC) after it was accused of failing to delete Alexa recordings at the request of parents.”
https://www.mediapost.com/publications/article/405635/califo...
"According to the [California Privacy Protection] agency, Todd Snyder told website visitors they could opt out of data sharing, but didn't actually allow them to do so for 40 days in late 2023 because its opt-out mechanism was improperly configured."
When things are free (in this case your input), they will get abused. Put a cost on it.
Some people were upset that Google Maps would just take the data that contributors give it for free. My problem was different. I use Google Maps and I want a way to correct it. I don't want to be paid for this. I want the tool I'm using to be correctable by me. The more I pay for it, the more I want it to be editable by me. I don't want compensation. I want it to be better. And I can make it better.
It's sort of why we picked Kong at a different company. Open source core meant that we could edit stuff we didn't like. In fact, considering that we paid, we wanted them to upstream what we changed.
I'm careful about what data of mine lands on someone else's server, so I'm not a fan of this even without the dark patterns.
The reference to terabytes of stolen data refers to copyrighted material. I think you know this but chose to frame it as "stuff freely posted on the internet" in order to mislead and strawman the other comment.
If they start to feed the next model with LLM generated crap, the overall performance will drop and instead of getting a useful answer 1 of 5 it will be 1 of 10(?) and probably a lot of us will cancel the subscription ... so in the end I think it matters.
Yes. It's way easier and cheaper when the data comes to you instead of having to scrape everything elsewhere.
What’s the angle that describes this as fair use?
[0] https://www.businessinsider.com/anthropic-cut-pirated-millio...
https://www.reuters.com/sustainability/boards-policy-regulat...
It’s shocking to me that anyone who works in our industry would trust any company to do as they claim.
If the AI companies were letting people download copies of their training data, copyright law would certainly have something to say about that. But no: once they download the training data, they keep it, and they don't share it.
> using his own copy of the data
Yes? That is a different thing? I guess we can keep moving the topic until we're talking about the same topic if you want. But honestly, I don't want to have that kind of conversation.Sure, I understand the concerns many if you have.
But in my niche areas of cognitive research, genetics, and neurophilosophy I need Claude to be much smarter than it is now. I am happy to share what I know with Anthropic so that zi eventually have a better companion thinker.
Straighten me out if I am wrong.
I need this opt-in to improve the foundational model that they have trained. It is good, but not good enough.
Agree that the fact that these improvements will accrue within in a proprietary for profit. But still a net positive fir my work.
Give me a FOSS LLM with Claude 4 Sonnet performance and a 1 million token context and I will work even harder toward improvements in my areas of biological NIH-funded research.
The fact that value is being created is irrelevant. The fact that they are making profit is irrelevant. As is non compensation to creators. There isn't any law being broken. Is there?
Bottom line in real world terms there is no expectation of privacy with a freely open and unrestricted web site. Even if that website said 'you can use this for single use but not mass use' that in itself is not legally or practically enforceable.
Let's take the example of a Christmas light show. The idea might be (in the homeowners mind) that people, families, will drive by in their cars to enjoy the light show (either a single home or the entire street or most of it). They might think 'we don't want buses full of people who paid to ride the bus' coming down the street. Unfortunately there is no way to prevent that (without the city and laws getting involved) and there is nothing wrong with the fact that the people who provide the bus are making money bringing people to see the light show.
...not if you believe in the right of general-purpose computing. If they have the right to read the data, why don't they have a right to program a computer to do it for them?
I think we all agree that they're not the good guys here, but this reasoning in particular is troubling.
What if you ask it for medical advice, or legal things? What if you turn on Gmail integration? Should I now be able to generate your conversations with the right prompt?
On the personal side. Given the LLM-s have not got the ground truth, everything is controlled hallucination, then - if the LLM tells you an imperfect version of my email or chat, you can never be sure if what the LLM told you is true, or not. So maybe you don't gain that much extra knowledge about me. For example, you can reasonably guess I'm typing this on the computer, and having coffee too. So if you ask the LLM "tell me a trivial story", and LLM comes back with "one morning, LJ was typing HN replies on the computer while having his morning coffee" - did you learn that much new about me, that you didn't know or could guess before?
On the "tragedy of the commons" side. We all benefit immensely from other people sharing their data, even very personal data. Any drug discovery, testing, approval - relies on many people allowing their data to be shared. Wider context - living in a group of people, involves radiating data outwards, and using data other people emit towards myself (and others), to have a functioning society. The more advanced the society, the more coordination it needs to achieve the right cooperation-competition balance in the interactions between ever greater numbers of people.
I think it's bad for me personally, and for everyone, that the "data privacy maximalists" had their desires codified in UK laws. My personal experience in the UK medical systems has been that the laws made my life worse, not better. Wrote here https://news.ycombinator.com/item?id=45066321
Sam Altman and his ilk are exploiting the incredibly slow moving legal system to enrich themselves.
In your case, try saying 'be nicer' or 'be more jovial'.
They literally show you a full-page popup with clear text snd OPT IN toggle. It doesn’t seem really shady to me (or worth 10 separate posts on HN).
That said, if this popup doesn’t appear when you sign up after 28th, that would be a dark pattern and shady stuff. For now it’s just clickbait
My entire comment was that the entire issue is about data ownership. Doesn't even matter if you have a copy of the data.
It matters how that copy was obtained.
There's no reason to then discuss if your usage violates the terms of a license if you obtained the data illegally. You're already in the illegal territory lol.
Having data != legally having obtained data
>>...> To be honest, these companies already stole terabytes of data and don't even disclose their dataset, so you have to assume they'll steal and train at anything you throw at them
>...> "Reading stuff freely posted on the internet" constitutes stealing now?
Literally everyone was talking about data ownership and you just said "I can download it, so it is fair game on my hardware." Let's say you didn't intend to say that. Well that doesn't matter, that's what a lot of people heard and you failed to clarify when pressed on this.So yeah, I think you're doing gymnastics
xAI trains Grok on both public data (Tweets) and non-public data (Conversations with Grok) by default. [0]
> Grok.com Data Controls for Training Grok: For the Grok.com website, you can go to Settings, Data, and then “Improve the Model” to select whether your content is used for model training.
Meta trains its AI on things posted to Meta's products, which are not as "public" as Tweets on X, because users expect these to be shared only with their networks. They do not use DMs, but they do use posts to Instagram/Facebook/etc. [1]
> We use information that is publicly available online and licensed information. We also use information shared on Meta Products. This information could be things like posts or photos and their captions. We do not use the content of your private messages with friends and family to train our AIs unless you or someone in the chat chooses to share those messages with our AIs.
OpenAI uses conversations for training data by default [2]
> When you use our services for individuals such as ChatGPT, Codex, and Sora, we may use your content to train our models.
> You can opt out of training through our privacy portal by clicking on “do not train on my content.” To turn off training for your ChatGPT conversations and Codex tasks, follow the instructions in our Data Controls FAQ. Once you opt out, new conversations will not be used to train our models.
[1] https://www.facebook.com/privacy/genai/
[2] https://help.openai.com/en/articles/5722486-how-your-data-is...