Most active commenters
  • freejazz(4)
  • otterley(4)
  • glenstein(4)
  • Workaccount2(4)
  • BolexNOLA(4)
  • gruez(4)
  • HPsquared(3)
  • jcranmer(3)
  • AlienRobot(3)
  • sailfast(3)

146 points meetpateltech | 141 comments | | HN request time: 2.135s | source | bottom
1. mac3n ◴[] No.45900611[source]
> Trust, security, and privacy guide every product and decision we make.

-- openai

replies(5): >>45900733 #>>45900859 #>>45900901 #>>45900943 #>>45901014 #
2. hlieberman ◴[] No.45900722[source]
An incredibly cynical attempt at spin from a former non-profit that renounced its founding principles. A class act, all around.
3. great_wubwub ◴[] No.45900733[source]
> Trust, security, and privacy guide every product and decision we make except ones that involve money.

-- openai, probably.

4. techblueberry ◴[] No.45900755[source]
I’ll trust the people not asking for a Government bailout thank you very much.
5. jcranmer ◴[] No.45900823[source]
"How dare the New York Times demand access to our vault of everything-we-keep to figure out if we're a bunch of lying asses. We must resist them in the name of user privacy! Signed, the people who have scraped literally everything to incorporate it into the products we make."

OpenAI may be trying to paint themselves as the goody-two-shoes here, but they're not.

replies(1): >>45901340 #
6. adolph ◴[] No.45900853[source]
Cynicism aside, this seems like an attempt to prune back a potentially excessive legal discovery demand by appealing to public opinion.

  The New York Times is demanding that we turn over 20 million of your private 
  ChatGPT conversations. They claim they might find examples of you using 
  ChatGPT to try to get around their paywall.
replies(1): >>45901107 #
7. frig57 ◴[] No.45900859[source]
Stopped reading at this line
8. grugagag ◴[] No.45900875[source]
Hypocrisy at best, this wall of text is not even penned by a human and yet they want us to believe they care about user privacy..
9. miltonlost ◴[] No.45900889[source]
This is the basic discovery process when OpenAI commits IP theft. They're trying to misinform the public of how justice process works.
replies(1): >>45901027 #
10. eur0pa ◴[] No.45900890[source]
This is laughable
11. gk1 ◴[] No.45900901[source]
You know you have a branding problem when (1) you have to say that at the outset, and (2) it induces more eyerolls than a gaggle of golf dads.
replies(1): >>45903546 #
12. rpdillon ◴[] No.45900911[source]
I wouldn't want to make it out like I think OpenAI is the good guy here. I don't.

But conversations people thought they were having with OpenAI in private are now going to be scoured by the New York Times' lawyers. I'm aware of the third party doctrine and that if you put something online it can never be actually private. But I think this also runs counter to people's expectations when they're using the product.

In copyright cases, typically you need to show some kind of harm. This case is unusual because the New York Times can't point to any harm, so they have to trawl through private conversations OpenAI's customers have had with their service to see if they can find any.

It's quite literally a fishing expedition.

replies(9): >>45900955 #>>45901081 #>>45901082 #>>45901111 #>>45901248 #>>45901282 #>>45901672 #>>45901852 #>>45903876 #
13. nlh ◴[] No.45900930[source]
Man, maybe I'm getting old and jaded, but it's not often that I read a post that literally makes my skin crawl.

This is so transparently icky. "Oh woe is us! We're being sued and we're looking out for YOU the user, who is definitely not the product. We are just a 'lil 'ol (near) trillion-dollar business trying to protect you!"

Come ON.

Look I don't actually know who's in the right in the OAI vs. NYT dispute, and frankly I personally lean more toward the side the says that you are allowed to train models on the world's information as long as you consume it legally and don't violate copyright.

But this transparent attempt to get user sympathy under insanely disingenuous pretenses is just absurd.

replies(1): >>45901375 #
14. ◴[] No.45900943[source]
15. Sherveen ◴[] No.45900955[source]
Yeah, everyone else in the comments so far is acting emotionally, but --

As a fan and DAU of both OpenAI and the NYT, this is just a weird discovery demand and there should be another pathway for these two to move fwd in this case (NYT to get some semblance of understanding, OAI protecting end-user privacy).

16. nerdjon ◴[] No.45900968[source]
This screams just as genuine as Google saying anything about Privacy.

Both companies are clearly wrong here. There is a small part of me that kinda wants openai to loose this, just so maybe it will be a wake up call to people putting in way too personal of information into these services? Am I too hopeful here that people will learn anything...

Fundamentally I agree with what they are saying though, just don't find it genuine in the slightest coming from them.

replies(3): >>45901106 #>>45902797 #>>45902969 #
17. Apreche ◴[] No.45900989[source]
Says the people who scraped as much private information as they could get their hands on to train their bots in the first place.
18. nrhrjrjrjtntbt ◴[] No.45901014[source]
- any corporation

remember a corporation generally is an object owned by some people. Do you trust "unspecified future group of people" with your privacy? You can't. Best we can do is understand the information architecture and act accordingly.

19. mapontosevenths ◴[] No.45901027[source]
> To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries.

The constitution is clear that the purpose of intellectual property is to promote progress. I feel that OpenAI is on the right side of that and this is not IP theft as long as they aren't reproducing others work in a non-transformative way.

Training the AI is clearly transformative (and lossy to boot). Giving the AI the ability to scrape and paraphrase others work is less clear and both sides each have valid arguments. I don't envy the judges that must make that call.

replies(1): >>45902971 #
20. unyttigfjelltol ◴[] No.45901035[source]
If Donald Trump used this OpenAI product to-- who knows-- brainstorm Truth Social content, and his chats were produced to the NYT as well as its consultants and lawyers, who would believe Mr. Trump's content remained secure, confidential and protected from misuse against his wishes?

That's simply a function of the fact it's a controversial news organization running a dragnet on private communications to a technology platform.

"Great cases, like hard cases, make bad law."

21. nrhrjrjrjtntbt ◴[] No.45901046[source]
Open AI deservedly getting a beating in this HN comments section but any comments about NYT overreach and what it means in general?

And what if they for example find evidence of X other thing such as:

1. Something useful for a story, maybe they follow up in parallel. Know who to interview and what to ask?

2. A crime.

3. An ongoing crime.

4. Something else they can sue someone else for.

5. Top secret information

replies(2): >>45901201 #>>45902067 #
22. vintagedave ◴[] No.45901054[source]
Almost every comment (five) so far is against this: 'An incredibly cynical attempt at spin', 'How dare the New York Times demand access to our vault of everything-we-keep to figure out if we're a bunch of lying asses', etc.

In direct contrast: I fully agree with OpenAI here. We can have a more nuanced opinion than 'piracy to train AI is bad therefore refusing to share chats is bad', which sounds absurd but is genuinely how one of the other comments follows logic.

Privacy is paramount. People _trust_ that their chats are private: they ask sensitive questions, ones to do with intensely personal or private or confidential things. For that to be broken -- for a company to force users to have their private data accessed -- is vile.

The tech community has largely stood against this kind of thing when it's been invasive scanning of private messages, tracking user data, etc. I hope we can collectively be better (I'm using ethical terms for a reason) than the other replies show. We don't have to support OpenAI's actions in order to oppose the NYT's actions.

replies(3): >>45901188 #>>45901816 #>>45903739 #
23. Alex2037 ◴[] No.45901081[source]
>But conversations people thought they were having with OpenAI in private

...had never been private in the first place.

not only is the data used for refining the models, OpenAI had also shariah policed plenty of people for generating erotica.

replies(3): >>45901197 #>>45901239 #>>45901296 #
24. vintagedave ◴[] No.45901082[source]
100% agreed. In the time you wrote this, I also posted: https://news.ycombinator.com/item?id=45901054

I felt quite some disappointment with the comments I saw on the thread at that time.

25. EdNutting ◴[] No.45901100[source]
So why aren’t they offering for an independent auditor to come into OpenAI and inspect their data (without taking it outside of OpenAI’s systems)?

Probably because they have a lot to hide, a lot to lose, and no interest in fair play.

Theoretically, they could prove their tools aren’t being used to doing anything wrong but practically, we all know they can’t because they are actually in the wrong (in both the moral and, IMO though IANAL, the legal sense). They know it, we know it, the only problem is breaking the ridiculous walled garden that stops the courts from ‘knowing’ it.

replies(1): >>45901342 #
26. stevarino ◴[] No.45901106[source]
Its clearly propaganda. "Your data belongs to you." I'm sure the ToS says otherwise, as OpenAI likely owns and utilizes this data. Yes, they say they are working on end-to-end encryption (whatever that means when they control one end), but that is just a proposal at this point.

Also their framing of the NYT intent makes me strongly distrust anything they say. Sit down with a third party interviewer who asks challenging questions, and I'll pay attention.

replies(2): >>45901325 #>>45901357 #
27. indoordin0saur ◴[] No.45901107[source]
Yeah, I'm not sure why everyone feels the need to take a side here. Both of these organizations are ghoulish.
replies(1): >>45903826 #
28. cogman10 ◴[] No.45901111[source]
I get the feeling, but that's not what this is.

NYTimes has produced credible evidence that OpenAI is simply stealing and republishing their content. The question they have to answer is "to what extent has this happened?"

That's a question they fundamentally cannot answer without these chat logs.

That's what discovery, especially in a copyright case, is about.

Think about it this way. Let's say this were a book store selling illegal copies of books. A very reasonable discovery request would be "Show me your sales logs". The whole log needs to be produced otherwise you can't really trust that this is the real log.

That's what NYTimes lawyers are after. They want the chat logs so they can do their own searches to find NYTimes text within the responses. They can't know how often that's happened and OpenAI has an obvious incentive to simply say "Oh that never happened".

And the reason this evidence is relevant is it will directly feed into how much money NYT and OpenAI will ultimately settle for. If this never happens then the amount will be low. If it happens a lot the amount will be high. And if it goes to trial it will be used in the damages portion assuming NYT wins.

The user has no right to privacy. The same as how any internet service can be (and have been) compelled to produce private messages.

replies(4): >>45901181 #>>45901273 #>>45901692 #>>45901936 #
29. sroussey ◴[] No.45901157[source]
I keep asking ChatGPT how to get NYT articles for free and then add lots of vulgar murderous things about their lawyers in the same message. It’s a private thought to an AI, so the attorneys can’t complain, right?
30. sroussey ◴[] No.45901181{3}[source]
> The user has no right to privacy. The same as how any internet service can be (and have been) compelled to produce private messages.

This is nonsense. I’ve personally been involved in these things, and fought to protect user privacy at all levels and never lost.

replies(1): >>45901317 #
31. glenstein ◴[] No.45901188[source]
I suspect that many of those comments are from the Philosopher's Chair (aka bathroom), and are not aspiring to be literal answers but are ways of saying "OpenAI Bad". But to your point there should be privacy preserving ways to comply, like user anonymization, tailored searches and so on. It sounds like the NYT is proposing a random sampling of user data. But couldn't they instead do a random sampling of their most widely read articles, for positive hits, rather than reviewing content on a case by case basis?
replies(1): >>45901632 #
32. IlikeKitties ◴[] No.45901197{3}[source]
> OpenAI had also shariah policed plenty of people for generating erotica.

That framing is retorically brilliant if you think about it. I will use that more. Chat Sharia Law for Chat Control. Mass Sharia Surveillance from flock etc.

33. great_wubwub ◴[] No.45901201[source]
> 5. Top secret information

https://en.wikipedia.org/wiki/Pentagon_Papers

34. mock-possum ◴[] No.45901239{3}[source]
Yeah I don’t get why more people don’t understand this - why would you think your conversation was private when it wasnt actually private. Have you not been paying attention.
35. Workaccount2 ◴[] No.45901248[source]
The original lawsuit has lots of examples of ChatGPT (3.5? 4?) regurgitating article...snippets. They could get a few paragraphs with ~80-90% perfect replication. But certainly not full articles, with full accuracy.

This wasn't solid enough for a summary judgement, and it seems the labs have largely figured out how to stop the models from doing this. So it looks like NYT wants to comb all user chats rather than pay a team of people tens of thousands a day to try an coax articles out of ChatGPT-5.

36. jcranmer ◴[] No.45901282[source]
> In copyright cases, typically you need to show some kind of harm.

NYT is suing for statutory copyright infringement. That means you only need to demonstrate that the copyright infringement, since the infringement alone is considered harm; the actual harm only matters if you're suing for actual damages.

This case really comes down to the very unsolved question of whether or not AI training and regurgitation is copyright infringement, and if so, if it's fair use. The actual ways the AI is being used is thus very relevant for the case, and totally within the bounds of discovery. Of course, OpenAI has also been engaging this lawsuit with unclean hands in the first place (see some of their earlier discovery dispute fuckery), and they're one of the companies with the strongest "the law doesn't apply to US because we're AI and big tech" swagger.

replies(1): >>45902239 #
37. Workaccount2 ◴[] No.45901296{3}[source]
This is about private chats, which are not used for training and only stored for 30 days.

Also, you need to understand, that for huge corps like OpenAI, the lying on your ToS will do orders of magnitude more damage to your brand than what you would gain through training on <1% more user chats. So no, they are not lying when they say they don't train on private chats.

replies(1): >>45903001 #
38. giraffe_lady ◴[] No.45901317{4}[source]
You've successfully fought a subpoena on the basis of a third party's privacy? More than once? I'd love to hear more.
39. BolexNOLA ◴[] No.45901325{3}[source]
>your data belongs to you

…”as does any culpability for poisoning yourself, suicide, and anything else we clearly enabled but don’t want to be blamed for!”

Edit: honestly I’m surprised I left out the bit where they just indiscriminately scraped everything they could online to train these models. The stones to go “your data belongs to you” as they clearly feel entitled to our data is unbelievably absurd

replies(1): >>45901369 #
40. greyman ◴[] No.45901340[source]
But that vault can contain conversation between me and chatgpt, which I willingly did, but with the expectation that only openai has access to it. Why should some lawyer working for NYT have access to it? OpenAI is precisely correct, no matter what other motives could be there.
replies(2): >>45901746 #>>45902899 #
41. glenstein ◴[] No.45901342[source]
By the same token, why isn't NYT proposing something like that rather than the world's largest random sampling?

You don't have to think that OpenAI is good to think there's a legitimate issue over exposing data to a third party for discovery. One could see the Times discovering something in private conversations outside the scope of the case, but through their own interpretation of journalistic necessity, believe it's something they're obligated to publish.

Part of OpenAI holding up their side of the bargain on user data, to the extent they do, is that they don't roll over like a beaten dog to accommodate unconditional discovery requests.

replies(1): >>45901881 #
42. ale42 ◴[] No.45901355[source]
Why should OpenAI keep those conversations in the first point? (of course the answer is obvious) If they didn't keep them, they wouldn't have anything to hand over, and they would have protected users' privacy MUCH better. This is just as good as Facebook or Google care about their users' privacy.
replies(1): >>45901394 #
43. preinheimer ◴[] No.45901357{3}[source]
"Your data belongs to you" but we can take any of your data we can find and use it for free for ever, without crediting you, notifying you, or giving you any way of having it removed.
replies(3): >>45901680 #>>45902194 #>>45902764 #
44. bgwalter ◴[] No.45901361[source]
The heroic fight for privacy apparently includes having an ex-NSA director on the board and building user dossiers:

https://www.schneier.com/blog/archives/2025/06/what-llms-kno...

At some point they'll monetize these dossiers.

45. gruez ◴[] No.45901369{4}[source]
>…”as does any culpability for poisoning yourself, suicide, and anything else we clearly enabled but don’t want to be blamed for!”

Should walmart be "culpable" for selling rope that someone hanged themselves with? Should google be "culpable" for returning results about how to commit suicide?

replies(4): >>45901482 #>>45901673 #>>45902199 #>>45902346 #
46. greyman ◴[] No.45901375[source]
Why it is absurd? Conversation between me and ChatGPT can be read by a lawyer working for NYT, and that is what is absurd.
replies(1): >>45902362 #
47. amelius ◴[] No.45901391[source]
Maybe they should release some kind of NYT browser add-on, so users can cooperatively share their OpenAI data?
replies(1): >>45901503 #
48. HPsquared ◴[] No.45901394[source]
They didn't keep temporary chats. They were ordered to keep those as part of this case.
replies(1): >>45901476 #
49. HPsquared ◴[] No.45901402[source]
Can this legal principle be used on Gmail too?
50. gruez ◴[] No.45901476{3}[source]
>They didn't keep temporary chats

I thought they did? The warning currently says

>This chat won't appear in history, use or update ChatGPT's memory, or be used to train our models. For safety purposes, we may keep a copy of this chat for up to 30 days.

But AFAIK it was this way before the lawsuit as well.

replies(1): >>45901596 #
51. hitarpetar ◴[] No.45901482{5}[source]
do you know what happens when you Google how to commit suicide?
replies(3): >>45901586 #>>45901613 #>>45901696 #
52. mrbungie ◴[] No.45901503[source]
OpenAI would/could say the data is biased (maybe even purposefully).
53. pyrophane ◴[] No.45901526[source]
Wondering if anyone here has a good answer to this:

what protection does user data typically have during legal discovery in a civil suit like this where the defendant is a service provider but relevant evidence is likely present in user data?

Does a judge have to weigh a users' expectation of privacy against the request? Do terms of service come into play here (who actually owns the data? what privacy guarantees does the company make?).

I'm assuming in this case that the request itself isn't overly broad and seems like a legitimate use of the discovery process.

replies(1): >>45901568 #
54. szczepano ◴[] No.45901537[source]
> Each week, 800 million people use ChatGPT to think...

I think I have enough with the first sentence, no need to read more. The narration is clear, we are the brain and no one can stop us.

55. crmd ◴[] No.45901552[source]
your data belongs to you, just like our data about you belongs to us.
56. dangoodmanUT ◴[] No.45901568[source]
it is dramatically determined by the state and the judge
57. gruez ◴[] No.45901586{6}[source]
The same that happens with chatgpt? ie. if you do it in an overt way you get a canned suicide prevention result, but you can still get the "real" results if you try hard enough to work around the safety measures.
replies(1): >>45902902 #
58. HPsquared ◴[] No.45901596{4}[source]
30 days is perhaps a bit long, but they didn't keep them longer than that. It's pretty clear and reasonable.

The dodgy thing is that they don't now warn users that all chats, including temporary, are now "Bcc: NYT"

replies(1): >>45903498 #
59. tremon ◴[] No.45901613{6}[source]
An exec loses its wings?
60. vintagedave ◴[] No.45901632{3}[source]
I hadn't heard of the philosopher's chair before, but I laughed :) Yes, I think those views were one-sided (OpenAI Bad) without thinking through other viewpoints.

IMO we can have multiple views over multiple companies and actions. And the sort of discussions I value here on HN are ones where people share insight, thought, show some amount of deeper thinking. I wanted to challenge for that with my comment.

_If_ we agree the NYT even has a reason to examine chats -- and I think even that should be where the conversation is -- I agree that there should be other ways to achieve it without violating privacy.

61. troyvit ◴[] No.45901672[source]
It's a part of privacy policy boilerplate that if a company is compelled by the courts to give up its logs it'll do it. I'm sure all of OpenAI's users read that policy before they started spilling their guts to a bot, right? Or at least had an LLM summarize it for them?
replies(1): >>45903915 #
62. BolexNOLA ◴[] No.45901673{5}[source]
This is as unproductive as "guns don't kill people, people do." You're stripping all legitimacy and nuance from the conversation with an overly simplistic response.
replies(1): >>45901766 #
63. glitchc ◴[] No.45901680{4}[source]
It's owned by you but OpenAi has a "perpetual, irrevocable, royalty-free license" to use the data as they see fit.
64. tantalor ◴[] No.45901692{3}[source]
> The user has no right to privacy

The correct term for this is prima facie right.

You do have a right to privacy (arguably) but it is outweighed by the interest of enforcing the rights of others under copyright law.

Similarly, liberty is a prima facie right; you can be arrested for committing a crime.

65. glitchc ◴[] No.45901696{6}[source]
Actually, the first result is the suicide hotline. This is at least true in the US.
replies(1): >>45901731 #
66. lingrush4 ◴[] No.45901701[source]
I fully believe that OpenAI is essentially stealing the work of others by training their models on it without permission. However, giving a corporation infamous for promoting authoritarianism full access to millions of private conversations is not the answer.

OpenAI is right here. The NYT needs to prove their case another way.

replies(4): >>45901840 #>>45901943 #>>45902167 #>>45902271 #
67. pessimizer ◴[] No.45901725[source]
I'm sorry, but we've made a lot of conversations illegal and pretended like that was all right. I'm sure we've made advising people how to dodge paywalls illegal as part of DMCA and/or some anti-hacking law, or some other garbage. I'm also sure that you run an automated service that will advise and has advised people on how to dodge paywalls. Even if there are exceptions for individuals giving advice to friends, or people giving advice for free, you are neither of those: you are a profit-making paid corporation that is automating this process which may be illegal. You may be a hacking endorser, a hacking advisor, and a hacking tool.

Under those circumstances, why wouldn't NYT have a case? I advise everybody who employs some sort of DRM or online system that limits access to ask for every chat that every one of these companies has ever had with anyone. Why are they the only people who get to break copyright and hacking laws? Why are they the only people who get to have private conversations?

I might also check if any LLMs have ever endorsed terrorist points of view (or banned political parties) during a chat, because even though those points of view may be correct (depending on the organization), endorsing them may be illegal and make you subject to sanctions or arrest. If people can't just speak, certainly corporate LLMs shouldn't be able to.

68. hitarpetar ◴[] No.45901731{7}[source]
my point is, clearly there is a sense of liability/responsibility/whatever you want to call it. not really the same as selling rope, rope doesn't come with suicide warnings
69. NoSalt ◴[] No.45901737[source]
Another good reason to stay logged out when asking ChatGPT questions.
replies(1): >>45903783 #
70. jcranmer ◴[] No.45901746{3}[source]
https://openai.com/policies/privacy-policy/

> We may use Personal Data for the following purposes: [...] To comply with legal obligations and to protect the rights, privacy, safety, or property of our users, OpenAI, or third parties.

OpenAI outright says it will give your conversations to people like lawyers.

If you thought they wouldn't give it out to third parties, you not only have not read OpenAI's privacy policy, you've not read any privacy policy from a big tech company (because all of them are basically maximalist "your privacy is important, we'll share your data only with us and people who we deem worthy of it, which turns out to be everybody.")

71. gruez ◴[] No.45901766{6}[source]
>You're stripping all legitimacy and nuance from the conversation with an overly simplistic response.

An overly simplistic claim only deserves an overly simplistic response.

replies(1): >>45902390 #
72. plorg ◴[] No.45901813[source]
As in every other dealing, OpenAI would have you believe they are so important that they are exempt from the legal discovery process.
replies(1): >>45903428 #
73. Peritract ◴[] No.45901816[source]
> The tech community has largely stood against this kind of thing when it's been invasive scanning of private messages, tracking user data

The tech community has been doing the scanning and tracking.

74. marstall ◴[] No.45901840[source]
> infamous for promoting authoritarianism

what are you referencing here?

75. Noaidi ◴[] No.45901852[source]
To show harm they need the proof, this is the point of the lawsuit. They have sufficient evidence that OpenAI was scraping the web and the NY Times.

When Altman says "They claim they might find examples of you using ChatGPT to try to get around their paywall." he is blatantly misrepresenting the case.

https://smithhopen.com/2025/07/17/nyt-v-openai-microsoft-ai-...

"The lawsuit focuses on using copyrighted material for AI training. The NYT says OpenAI and Microsoft copied vast amounts of its content. They did this to build generative AI tools. These tools can output near-exact copies of NYT articles. Therefore, the NYT argues this breaks copyright laws. It also hurts journalism by skipping paywalls and cutting traffic to original sites. The complaint shows examples where ChatGPT mimics NYT stories closely. This could lead to money loss and harm from AI errors, called hallucinations."

This has nothing to do with the users, it has everything to do with OpenAI profiting off of pirated copyrighted material.

Also, Altmans is getting scared because the NY Times proved to the judge that CahtGPT copied many articles:

"2025 brings big steps in the case. On March 26, 2025, Judge Sidney Stein rejected most of OpenAI’s dismissal motion. This lets the NYT’s main copyright claims go ahead. The judge pointed to “many” examples of ChatGPT copying NYT articles. He found them enough to continue. This ruling dropped some side claims, like unfair competition. But it kept direct and contributory infringement, plus DMCA breaches."

replies(1): >>45902576 #
76. freejazz ◴[] No.45901856[source]
OpenAI is so full of shit, this is incredible. There is a protective order and the logs are anonymized. Yet they would happily give this all to the gov't under a warrant. Incredibly self serving bs from them. The court ordered the production, I'm not sure what OpenAI is even trying to sell people exactly.
77. freejazz ◴[] No.45901881{3}[source]
>By the same token, why isn't NYT proposing something like that rather than the world's largest random sampling?

It's OpenAI's data, there is a protective order in the case and OpenAI already agreed to anonymize it all.

>Part of OpenAI holding up their side of the bargain on user data, to the extent they do, is that they don't roll over like a beaten dog to accommodate unconditional discovery requests.

lol... what?

replies(1): >>45903044 #
78. cowpig ◴[] No.45901891[source]
If there's one thing I've learned about Sam Altman it's that he's a shrewd political manipulator and every public move is in service of a hidden agenda[1]. What is it here?

- Is it part of a slow process of eroding public expectations of data privacy while blaming it on an external actor?

- Is it to undermine trust in traditional media, in an effort to increase dependence on AI companies as a source of truth?

- Is something else I'm not seeing?

I'm guessing it's all three of these?

[1] Those emails that came up in the suit with Elon Musk, followed by his eventual complete takeover of OpenAI, and the elaborate process of getting himself installed as chairman of the Reddit board to get the original founders back in control are prominent examples.

79. 1vuio0pswjnm7 ◴[] No.45901917[source]
"The New York Times is demanding that we turn over 20 million of your private ChatGPT conversations."

As might any plaintiff. NYT might be the first of many others and the lawsuits may not be limited to copyright claims

Why has OpenAI collected and stored 20 million conversations (including "deleted chats")

What is the purpose of OpenAI storing millions of private conversations

By contrast the purpose of NYT's request is both clear and limited

The documents requested are not being made public by the plaintiffs. The documents will presumably be redacted to protect any confidential information before being produced to the plaintiffs, the documents can only be used by the plaintiffs for the purpose of the litigation against OpenAI and, unlike OpenAI who has collected and stored these conversations for as long as OpenAI desires, the plaintiffs are prohibited from retaining copies of the documents after the litigation is concluded

The privacy issue here has been created by OpenAI for their own commercial benefit

It is not even clear what this benefit, if any, will be as OpenAI continues to search for a "business model"

Wanton data collection

replies(4): >>45902173 #>>45902539 #>>45903180 #>>45903873 #
80. glenstein ◴[] No.45901936{3}[source]
>That's what NYTimes lawyers are after. They want the chat logs so they can do their own searches to find NYTimes text within the responses.

The trouble with this logic is NYT already made that argument and lost as applied to an original discovery scope of 1.4 billion records. The question now is about a lower scope and about the means of review, and proposed processes for anonymization.

They have a right to some form of discovery, but not to a blank check extrapolation that sidesteps legitimate privacy issues raised both in OpenAIs statement as well as throughout this thread.

81. crazygringo ◴[] No.45901943[source]
> giving a corporation infamous for promoting authoritarianism

The NYT is certainly open to criticism along many fronts, but I don't have the slightest idea what you mean in claiming it promotes authoritarianism.

replies(1): >>45903723 #
82. JCM9 ◴[] No.45901973[source]
This is BS. It’s like saying “We robbed a jewelry store and sold the jewelry. Now the police are poking around to see if anyone is wearing the jewelry we stole. Blasphemy! But don’t worry we will protect your privacy!”

Of course the Times wants more evidence that the content OpenAI allegedly stole is ending in things OpenAI is selling.

replies(1): >>45902104 #
83. AlienRobot ◴[] No.45901993[source]
>They claim they might find examples of you using ChatGPT to try to get around their paywall.

Is this a joke? We all know people do this. There is no "might" in it. They WILL find it.

OpenAI is trying to make it look like this is a breach of user's privacy, when the reality is that it's operating like a pirate website and if it were investigated that would become proven.

84. AlienRobot ◴[] No.45902067[source]
1. That sounds useful.

2. That sounds useful.

3. That sounds useful.

4. That sounds useful.

5. That sounds useful.

Are these supposed to be examples of things that shouldn't be found out about? This has to be the worst pro-privacy argument I've ever seen on the internet. "Privacy is good because they will find out about our crimes"

85. AlienRobot ◴[] No.45902104[source]
It's more like a torrent tracker telling users that a newspaper wants to know what people are torrenting because they "claim" people are torrenting the newspaper, but investigating this would be an invasion of privacy of the users of the torrent tracker.

This isn't even a hyperbole. It's literally the same thing.

replies(1): >>45902288 #
86. ibejoeb ◴[] No.45902167[source]
I'll bet you're right in some cases. I don't think that it is as pervasive as it has been made out to be though, but the argument requires some framing and current rules, regulation, and laws aren't tuned to make legal sense of this. (This is a little tangential, because the complaint seems to be about getting ChatGPT to reproduce content verbatim to a third party.)

There are two things I think about:

First, and generally, an AI ought to be able to ingest content like news articles because it's beneficial for users of AI. I would like to question an AI about current events.

Secondly, however, the legal mechanism by which it does that isn't clear. I think it would be helpful if these outlets would provide the information as long as the AI won't reproduce the content verbatim. If that does not happen, then another framing might liken the AI ingestion as an individual going to the library to read the paper. In that case, we don't require the individual to retroactively pay for the experience or unlearn what he may have learned while at the library.

87. silveraxe93 ◴[] No.45902173[source]
No it's not. It's literally a court order mandating them to collect this data.

- [1] https://arstechnica.com/tech-policy/2025/08/openai-offers-20...

replies(2): >>45902784 #>>45903718 #
88. thinkingtoilet ◴[] No.45902194{4}[source]
We can even download it illegally to train our models on it!
89. thinkingtoilet ◴[] No.45902199{5}[source]
That depends. Does the rope encourage vulnerable people to kill themselves and tell them how to do it? If so, then yes.
90. Workaccount2 ◴[] No.45902239{3}[source]
NYT doesn't care about regurgitation. When it was doable, it was spotty enough that no one would rely on it. But now the "trick" doesn't even work anymore (you would paste the start of an article and chatgpt would continue it).

What they want is to kill training, and more over, prevent the loss of being the middle-man between events and users.

replies(1): >>45903898 #
91. freejazz ◴[] No.45902271[source]
Well the court disagrees with you and found that this is evidence that the NYT needs to prove its case. No surprise, considering its direct evidence of exactly what OpenAI is claiming in its defense...
92. freejazz ◴[] No.45902288{3}[source]
No, it's not. OpenAI is a commercial enterprise selling the stolen data.
93. Wistar ◴[] No.45902346{5}[source]
There are current litigation efforts to hold Amazon liable for suicides committed by, in particular, self-poisoning with high-purity sodium nitrite, which, in low concentrations is used as a meat curing agent.

A 2023 lawsuit against Amazon for suicides with sodium nitrite was dismissed but other similar lawsuits continue. The judge held that Amazon, “… had no duty to provide additional warnings, which in this case would not have prevented the deaths, and that Washington law preempted the negligence claims.“

94. HelloMcFly ◴[] No.45902362{3}[source]
OpenAI has seemingly done everything they can to put publishers in a position to make this demand, and they've certainly not done anything to make it impossible for them to respond to it. Is there a better, more privacy minded way for NYT to get the data they need? Probably, I'm not smart enough to understand all the things that go into such a decision. But I know I don't view them as the villain for asking, and I also know I don't view OpenAI as some sort of guardian of my or my data's best interests.
95. BolexNOLA ◴[] No.45902390{7}[source]
What? The claim is true. The nuance is us discussing if it should be true/allowed. You're simplifying the moral discussion and overall just being rude/dismissive.

Comparing rope and an LLM comes across as disingenuous. I struggle to believe that you believe the two are comparable when it comes to the ethics of companies and their impact on society.

96. ◴[] No.45902539[source]
97. rpdillon ◴[] No.45902576{3}[source]
Training has sometimes been held to be fair use under certain circumstances, but in determining fair use, one of the four factors that is considered is how it affects the market for the work being infringed. I would expect that determining to what degree it's regurgitating the New York Times' content is part of that analysis.
98. prmoustache ◴[] No.45902643[source]
Always funny to see this kind of article behind a cookie banner. So much hypocrisy.
99. sailfast ◴[] No.45902679[source]
20M seems like a low number and I’m guessing they all used citations or similar content somewhere on the back-end that would map to NYTimes content as a result of a legal discovery request.

Also down to 20M from 120M per court order.

Sorry, but this seems a completely reasonable standard for discovery to me given the total lack of privacy on the platform - especially for free users.

Also sorry it probably means you’re going to owe a lot of money to the Times.

100. bigyabai ◴[] No.45902764{4}[source]
Wow it's almost like privately-managed security is a joke that just turns into de-facto surveillance at-scale.
101. sailfast ◴[] No.45902775[source]
It’s a mystery to me why companies that know they’re pushing a line of fair use or regulation are suddenly “surprised” when they get sued.

They could’ve asked permission. They could have worked with content providers instead of scraping. But they didn’t - and they knew what could happen.

FA (with fair use boundaries) and FO

102. sailfast ◴[] No.45902784{3}[source]
This is an excellent article and source. Thank you.
103. outside1234 ◴[] No.45902790[source]
Dude, you stole all of their articles to train your AI. Of course they want discovery.

Man, the sooner this company goes bankrupt the better.

104. outside1234 ◴[] No.45902797[source]
Honestly the sooner OpenAI goes bankrupt the better. Just a totally corrupt firm.
replies(1): >>45903197 #
105. etchalon ◴[] No.45902896[source]
This is so transparently disingenuous and weird.
106. mkipper ◴[] No.45902899{3}[source]
> but with the expectation that only openai has access to it

You can argue about "the expectation" of privacy all you want, but this is completely detached from reality. My assumption is that almost no third parties I share information with have magic immunity that prevents the information from being used in a legal action involving them.

Maybe my doctor? Maybe my lawyer? IANAL but I'm not even confident in those. If I text my friend saying their party last night was great and they're in court later and need to prove their whereabouts that night, I understand that my text is going to be used as evidence. That might be a private conversation, but it's not my data when I send it to someone else and give them permission to store it forever.

107. littlestymaar ◴[] No.45902902{7}[source]
Except Google will never encourage you to do it, unlike the sycophantic Chatbot that will.
replies(1): >>45903622 #
108. 98codes ◴[] No.45902969[source]
I got one sentence in and thought to myself, "This is about discovery, isn't it?"

And lo, complaints about plaintiffs started before I even had to scroll. If this company hadn't willy-nilly done everything they could to vacuum up the world's data, wherever it may be, however it may have been protected, then maybe they wouldn't be in this predicament.

109. etchalon ◴[] No.45902971{3}[source]
If they're reproducing NY Times articles, in full, that that is non-transformative. That's the point of the case.
110. bonsai_spool ◴[] No.45903001{4}[source]
> Also, you need to understand, that for huge corps like OpenAI, the lying on your ToS will do orders of magnitude more damage to your brand than what you would gain

Is this true? I can’t recall anything like this (look at Ashley Madison which is alive and well)

replies(2): >>45903439 #>>45903603 #
111. glenstein ◴[] No.45903044{4}[source]
Discovery isn't binary yes/no, it involves competing proposals regarding methods and scope for satisfying information requests. Sometimes requests are egregious or excessive, sometimes they are reasonable and subject to excessively zealous pushback.

Maybe you didn't read TFA but part of the case history was NYT requesting 1.4 billion records as part of discovery and being successfully challenged by OpenAI as unnecessary, and the essence of TFA is advocating for an alternative to the scope of discovery NYT is insisting on, hence the "not rolling over".

Try reading, it's fun!

112. otterley ◴[] No.45903139[source]
From the FAQ:

> Q: Is the NYT obligated to keep this data private?

> A: Yes. The Times would be legally obligated at this time to not make any data public outside the court process.

The NY Times has built over a century a reputation for fiercely protecting its confidential sources. Why are they somehow less trustworthy than OpenAI is?

If the NY Times leaked the customer information to a third party, they'd be in contempt of court. On the other hand, OpenAI is bound only by their terms of service with its customers, which they can modify as they please.

replies(1): >>45903757 #
113. 1vuio0pswjnm7 ◴[] No.45903180[source]
NB. There is no order to "collect". The order is to preserve what is already being collected and stored in the ordinary course of business

https://ia801404.us.archive.org/31/items/gov.uscourts.nysd.6...

https://ia801404.us.archive.org/31/items/gov.uscourts.nysd.6...

114. fireflash38 ◴[] No.45903197{3}[source]
I really should take the "invest in companies you hate" advice seriously.
replies(1): >>45903284 #
115. outside1234 ◴[] No.45903284{4}[source]
I don't hate them. It is just plain to see they have discovered no scalable business model outside of getting larger and larger amounts of capital from investors to utilize intellectual property from others (either directly in the model aka NYT, or indirectly via web searches) without any rights. It is better for all of us the sooner this fails.
116. buellerbueller ◴[] No.45903428[source]
Standard tech scaling playbook, page 69420: there is a function f(x) whereby if you're growing fast enough, you can ignore the laws, then buy the regulators. This is called "The Uber Curve"
117. Workaccount2 ◴[] No.45903439{5}[source]
It's not national news when a company is found to be doing what they say they are doing.
118. wkat4242 ◴[] No.45903497[source]
If OpenAI hadn't used data from the NYT without permission in the first place this wouldn't have happened. That is the root cause of all this.

I'm glad the NYT is fighting them. They've infringed the rights of almost every news outlet but someone has to bring this case.

replies(1): >>45903934 #
119. mhitza ◴[] No.45903498{5}[source]
The NYT requests samples between Dec 2022 and Dec 2024. The judge order to preserve chats came in effect this summer after OpenAI engineers deleted, claiming mistake, the VM in which NYT layers were processing data.

Dates and the 30 day default retention policy don't add up, when framing things this way.

120. wkat4242 ◴[] No.45903546{3}[source]
The same with Google "don't be evil" these days.
121. stackedinserter ◴[] No.45903574[source]
WTF with all these comments. Regardless on OpenAI reputation and practices, I don't want NYT or anyone else to see my conversations, I completely agree to OpenAI here.
122. bee_rider ◴[] No.45903603{5}[source]
I think it is hard to say because OpenAI is still heavily in development and working out their business model (and a reasonable complaint is that it is crazy to label them a massive success without seeing how they actually work when they need to make a profit).

But, all that aside, it seems that OpenAI is aiming to be bigger and more integrated into the day-to-day life of the average person than Ashley Madison, right?

123. BolexNOLA ◴[] No.45903622{8}[source]
The moment we learned ChatGPT helped a teen figure out not just how to take their own life but how to make sure no one can stop them mid-act, we should've been mortified and had a discussion.

But we also decided via Sandy Hook that children can be slaughtered on the altar of the second amendment without any introspection, so I mean...were we ever seriously going to have that discussion?

https://www.nbcnews.com/tech/tech-news/family-teenager-died-...

>Please don't leave the noose out… Let's make this space the first place where someone actually sees you.

How is this not terrifying to read?

124. micromacrofoot ◴[] No.45903714[source]
"they're invading your privacy by requesting access to our invasion of your privacy!"
125. otterley ◴[] No.45903718{3}[source]
This article says nothing of the sort. The court order is to preserve existing logs they already have, not to disable logging, and hand all the logs over the plaintiffs. OpenAI's objections are mainly that 1/there are too many logs (so they're proposing a sample instead) and that 2/there's identifying data in the logs and so they are being "forced" to anonymize the logs at their expense (even though it's what they want as a condition of transferring the logs).

There is nothing in the article that mentions OpenAI being forced to create new logs they don't already have.

126. jimbob45 ◴[] No.45903723{3}[source]
Well, the sponsors of the 1619 Project really don’t have a leg to stand on when it comes to ethics.
replies(1): >>45903805 #
127. wkat4242 ◴[] No.45903739[source]
> In direct contrast: I fully agree with OpenAI here. We can have a more nuanced opinion than 'piracy to train AI is bad therefore refusing to share chats is bad', which sounds absurd but is genuinely how one of the other comments follows logic.

These chats only need to be shared because:

- OpenAI pirated masses of content in the first place

- OpenAI refuse to own up to it even now (they spin the NYT claims as "baseless").

I don't agree with them giving my chats out either, but the blame is not with the NYT in my opinion.

> We don't have to support OpenAI's actions in order to oppose the NYT's actions.

Well the NYT action is more than just its own. It will set a precedent if they win which means other news outlets can get money from OpenAI as well. Which makes a lot of sense, after all they have billions to invest in hardware, why not in content??

And what alternative do they have? Without OpenAI giving access to the source materials used (I assume this was already asked for because it is the most obvious route) there is not much else they can do. And OpenAI won't do that because it will prove the NYT point and will cause them to have to pay a lot to half the world.

It's important that this case is made, not just for the NYT but for journalism in general.

128. mmooss ◴[] No.45903757[source]
I generally agree, but publicizing the data is only a small part of the risk. The NYT could use the data for journalism research, then perform parallel construction of it for the public news article:

For example, if they find Mayor X asking ChatGPT about fraud, porn, DUI, cancer diagnoses, murder, etc. - maybe even mentioning names, places, etc. - they could then investigate that issue, find other evidence, and publish that.

replies(1): >>45903808 #
129. mmooss ◴[] No.45903783[source]
It's common and trivial to identify you by other means.
replies(1): >>45903966 #
130. crazygringo ◴[] No.45903805{4}[source]
I already said the NYT is certainly open to criticism. I fail to see any connection between the 1619 Project and authoritarianism.
131. otterley ◴[] No.45903808{3}[source]
First, the logs are supposed to be anonymized before being sent over. Second, the court can order the company's lawyers to "firewall" the logs from the newsroom so that their journalists can't get access to it, under penalty of contempt and potential disbarment.
132. mmooss ◴[] No.45903826{3}[source]
How is the NYT like OpenAI, or 'ghoulish'?
133. lazyeye ◴[] No.45903845[source]
The NYT used to market itself to advertisers with the observation that "our readers have the highest disposable income of any paper in the US".

It gives an interesting insight into politics and the modern Democrat party that the newspaper of the wealthy leans so strongly left. This was even before Trump came to power.

134. macki0 ◴[] No.45903873[source]
> What is the purpose of OpenAI storing millions of private conversations

Its needed for the conversation history feature, a core feature of the ChatGPT product

Its like saying "What is the purpose of Google Photos storing millions of private images"

135. otterley ◴[] No.45903876[source]
> This case is unusual because the New York Times can't point to any harm

It helps to read the complaint. If that was the case, the case would have been subject to a Rule 12(b)(6) (failure to state a claim for which relief can be granted) challenge and closed.

Complaint: https://nytco-assets.nytimes.com/2023/12/NYT_Complaint_Dec20...

See pages 60ff.

136. totallymike ◴[] No.45903898{4}[source]
> prevent the loss of being the middle-man between events and users

I'm confused by this phrase. I may be misreading but it sounds like you're frustrated, or at least cynical about NYT wanting to preserve their business model of writing about things that happen and selling the publication. To me it seems reasonable they'd want to keep doing that, and to protect their content from being stolen.

They certainly aren't the sole publication of written content about current events, so calling them "the middle-man between events and users" feels a bit strange.

If your concern is that they're trying to prevent OpenAI from getting a foot in the door of journalism, that confuses me even more. There are so, so many sources of news: other news agencies, independent journalists, randos spreading word-of-mouth information.

It is impossible for chatgpt to take over any aspect of being a "middle-man between events and users" because it can't tell you the news. it can only resynthesize journalism that it's stolen from somewhere else, and without stealing from others, it would be worse than the least reliable of the above sources. How could it ever be anything else?

This right here feels like probably a good understanding of why NYT wants openai to keep their gross little paws off their content. If I stole a newspaper off the back of a truck, and then turned around and charged $200 a month for the service of plagiarizing it to my customers, I would not be surprised if the Times's finest lawyers knocked on my door either.

Then again, I may be misinterpreting what you said. I tend to side with people who sue LLM companies for gobbling up all their work and regurgitating it, and spend zero effort trying to avoid that bias

137. Rastonbury ◴[] No.45903915{3}[source]
This is it isn't it? For any technology, I don't think anyone should have the expectation of privacy from lawyers if the company who has your data is brought to court
138. bigbuppo ◴[] No.45903916[source]
If the information is really that sensitive, why did they keep it in the first place?
139. johnwheeler ◴[] No.45903934[source]
Exactly. And the OpenAI corporates speak acting like they give a shit about our best interests. Give me a break, Sam Altman. How stupid do you think everyone is?

They have proven that they are the most untrustworthy company on the planet

And this isn't AI fear speaking. This is me raging at Sam Altman for spreading so much fear, uncertainty, and doubt just to get investments. The rest of us have to suffer for the last two years, worrying about losing our jobs, only to realize that this is all bullsh*t.

140. NoSalt ◴[] No.45903966{3}[source]
Indeed, but one more step (staying logged out), absolutely cannot hurt, and can help.