←back to thread

747 points porridgeraisin | 1 comments | | HN request time: 0s | source
Show context
I_am_tiberius ◴[] No.45062905[source]
In my opinion, training models on user data without their real consent (real consent = e.g. the user must sign a contract or so, so he's definitely aware), should be considered a serious criminal offense.
replies(5): >>45062989 #>>45063008 #>>45063221 #>>45063771 #>>45064402 #
jsheard ◴[] No.45062989[source]
Why single out user data specifically? Most of the data Anthropic and co train on was just scooped up from wherever with zero consent, not even the courtesy of a buried TOS clause, and their users were always implicitly fine with that. Forgive me for not having much sympathy when the users end up reaping what they've sown.
replies(3): >>45063012 #>>45063051 #>>45063335 #
__MatrixMan__ ◴[] No.45063335[source]
Publishing something is considered by most to be sufficient consent for it to be not considered private.

I realize there's a whole legal quagmire here involved with intellectual "property" and what counts as "derivative work", but that's a whole separate (and dubiously useful) part of the law.

replies(1): >>45063793 #
chamomeal ◴[] No.45063793[source]
That is definitely normally true but I feel like the scale and LLM usage turns it into a different problem.

If you can use all of the content of stack overflow to create a “derivative work” that replaces stack overflow, and causes it to lose tons of revenue, is it really a derivative work?

I’m pretty sure solution sites like chegg don’t include the actual questions for that reason. The solutions to the questions are derivative, but the questions aren’t.

replies(2): >>45063899 #>>45064495 #
1. airstrike ◴[] No.45063899[source]
Replacing stack overflow has no bearing on the definition of "derivative"