Fighting the New York Times' invasion of user privacy

(openai.com)

237 points meetpateltech | 3 comments | 12 Nov 25 14:08 UTC | HN request time: 0s | source

Show context

rpdillon ◴[12 Nov 25 14:48 UTC] No.45900911[source]▶

I wouldn't want to make it out like I think OpenAI is the good guy here. I don't.

But conversations people thought they were having with OpenAI in private are now going to be scoured by the New York Times' lawyers. I'm aware of the third party doctrine and that if you put something online it can never be actually private. But I think this also runs counter to people's expectations when they're using the product.

In copyright cases, typically you need to show some kind of harm. This case is unusual because the New York Times can't point to any harm, so they have to trawl through private conversations OpenAI's customers have had with their service to see if they can find any.

It's quite literally a fishing expedition.

replies(10): >>45900955 #>>45901081 #>>45901082 #>>45901111 #>>45901248 #>>45901282 #>>45901672 #>>45901852 #>>45903876 #>>45906668 #

cogman10 ◴[12 Nov 25 15:04 UTC] No.45901111[source]▶

>>45900911 #

I get the feeling, but that's not what this is.

NYTimes has produced credible evidence that OpenAI is simply stealing and republishing their content. The question they have to answer is "to what extent has this happened?"

That's a question they fundamentally cannot answer without these chat logs.

That's what discovery, especially in a copyright case, is about.

Think about it this way. Let's say this were a book store selling illegal copies of books. A very reasonable discovery request would be "Show me your sales logs". The whole log needs to be produced otherwise you can't really trust that this is the real log.

That's what NYTimes lawyers are after. They want the chat logs so they can do their own searches to find NYTimes text within the responses. They can't know how often that's happened and OpenAI has an obvious incentive to simply say "Oh that never happened".

And the reason this evidence is relevant is it will directly feed into how much money NYT and OpenAI will ultimately settle for. If this never happens then the amount will be low. If it happens a lot the amount will be high. And if it goes to trial it will be used in the damages portion assuming NYT wins.

The user has no right to privacy. The same as how any internet service can be (and have been) compelled to produce private messages.

replies(8): >>45901181 #>>45901273 #>>45901692 #>>45901936 #>>45904217 #>>45904558 #>>45905078 #>>45907923 #

1. observationist ◴[12 Nov 25 18:49 UTC] No.45904217[source]▶

>>45901111 #

You don't hate the media nearly enough.

"Credible" my ass. They hired "experts" who used prompt engineering and thousands of repetitions to find highly unusual and specific methods of eliciting text from training data that matched their articles. OpenAI has taken measures to limit such methods and prevent arbitrary wholesale reproduction of copyrighted content since that time. That would have been the end of the situation if NYT was engaging in good faith.

The NYT is after what they consider "their" piece of the pie. They want to insert themselves as middlemen - pure rent seeking, second hander, sleazy lawyer behavior. They haven't been injured, they were already dying, and this lawsuit is a hail mary attempt at grifting some life support.

Behavior like that of the NYT is why we can't have nice things. They're not entitled to exist, and by engaging in behavior like this, it makes me want them to stop existing, the faster, the better.

Copyright law is what you get when a bunch of layers figure out how to encode monetization of IP rights into the legal system, having paid legislators off over decades, such that the people that make the most money off of copyrights are effectively hoarding those copyrights and never actually produce anything or add value to the system. They rentseek, gatekeep, and viciously drive off any attempts at reform or competition. Institutions that once produced valuable content instead coast on the efforts of their predecessors, and invest proceeds into lawsuits, lobbying, and purchase of more IP.

They - the NYT - are exploiting a finely tuned and deliberately crafted set of laws meant to screw actual producers out of percentages. I'm not a huge OpenAI fan, but IP laws are a whole different level of corrupt stupidity at the societal scale. It's gotcha games all the way down, and we should absolutely and ruthlessly burn down that system of rules and salt the ground over it. There are trivially better systems that can be explained in a single paragraph, instead of requiring books worth of legal code and complexities.

replies(1): >>45905676 #

2. totallymike ◴[12 Nov 25 20:06 UTC] No.45905676[source]▶

>>45904217 (TP) #

I'm not a fan of NYT either, but this feels like you're stretching for your conclusion:

> They hired "experts" who used prompt engineering and thousands of repetitions to find highly unusual and specific methods of eliciting text from training data that matched their articles....would have been the end of the situation if NYT was engaging in good faith.

I mean, if I was performing a bunch of investigative work and my publication was considered the source of truth in a great deal of journalistic effort and publication of information, and somebody just stole my newspaper off the back of a delivery truck every day and started rewriting my articles, and then suddenly nobody read my paper anymore because they could just ask chatgpt for free, that's a loss for everyone, right?

Even if I disagree with how they editorialize, the Times still does a hell of a lot of journalism, and chatgpt can never, and will never be able to actually do journalism.

> they want to insert themselves as middlemen - pure rent seeking, second hander, sleazy lawyer behavior

I'd love to hear exactly what you mean by this.

Between what and what are they trying to insert themselves as middlemen, and why is chatgpt the victim in their attempts to do it?

What does 'rent seeking' mean in this context?

What does 'second hander' mean?

I'm guessing that 'sleazy lawyer' is added as an intensifier, but I'm curious if it means something more specific than that as well, I suppose.

> Copyright law....the rest of it

Yeah. IP rights and laws are fucked basically everywhere. I'm not smart enough to think of ways to fix it, though. If you've got some viable ideas, let's go fix it. Until then, the Times kinda need to work with what we've got. Otherwise, OpenAI is going to keep taking their lunch money, along with every other journalist's on the internet, until there's no lunch money to be had from anyone.

replies(1): >>45907955 #

3. terminalshort ◴[12 Nov 25 22:45 UTC] No.45907955[source]▶

>>45905676 #

> my publication was considered the source of truth

Their publication is not considered the source of truth, at least not by anyone with a brain.

↑