←back to thread

237 points meetpateltech | 4 comments | | HN request time: 0.637s | source
Show context
rpdillon ◴[] No.45900911[source]
I wouldn't want to make it out like I think OpenAI is the good guy here. I don't.

But conversations people thought they were having with OpenAI in private are now going to be scoured by the New York Times' lawyers. I'm aware of the third party doctrine and that if you put something online it can never be actually private. But I think this also runs counter to people's expectations when they're using the product.

In copyright cases, typically you need to show some kind of harm. This case is unusual because the New York Times can't point to any harm, so they have to trawl through private conversations OpenAI's customers have had with their service to see if they can find any.

It's quite literally a fishing expedition.

replies(10): >>45900955 #>>45901081 #>>45901082 #>>45901111 #>>45901248 #>>45901282 #>>45901672 #>>45901852 #>>45903876 #>>45906668 #
cogman10 ◴[] No.45901111[source]
I get the feeling, but that's not what this is.

NYTimes has produced credible evidence that OpenAI is simply stealing and republishing their content. The question they have to answer is "to what extent has this happened?"

That's a question they fundamentally cannot answer without these chat logs.

That's what discovery, especially in a copyright case, is about.

Think about it this way. Let's say this were a book store selling illegal copies of books. A very reasonable discovery request would be "Show me your sales logs". The whole log needs to be produced otherwise you can't really trust that this is the real log.

That's what NYTimes lawyers are after. They want the chat logs so they can do their own searches to find NYTimes text within the responses. They can't know how often that's happened and OpenAI has an obvious incentive to simply say "Oh that never happened".

And the reason this evidence is relevant is it will directly feed into how much money NYT and OpenAI will ultimately settle for. If this never happens then the amount will be low. If it happens a lot the amount will be high. And if it goes to trial it will be used in the damages portion assuming NYT wins.

The user has no right to privacy. The same as how any internet service can be (and have been) compelled to produce private messages.

replies(8): >>45901181 #>>45901273 #>>45901692 #>>45901936 #>>45904217 #>>45904558 #>>45905078 #>>45907923 #
1. realusername ◴[] No.45905078[source]
> NYTimes has produced credible evidence that OpenAI is simply stealing and republishing their content. The question they have to answer is "to what extent has this happened?"

Credible to whom? In their supposed "investigation", they sent a whole page of text and complex pre-prompting and still failed to get the exact content back word for word. Something users would never do anyways.

And that's probably the best they've got as they didn't publish other attempts.

replies(2): >>45907081 #>>45907358 #
2. ◴[] No.45907081[source]
3. mikkupikku ◴[] No.45907358[source]
Agreed, they could carefully coerce the model to more or less output some of their articles, but the premise that users were routinely doing this to bypass the paywall is silly.
replies(1): >>45907959 #
4. terminalshort ◴[] No.45907959[source]
Especially when you can just copy paste the url into Internet Archive and read it. And yet they aren't suing Internet Archive.