←back to thread

728 points freetonik | 1 comments | | HN request time: 0.208s | source
Show context
jedbrown ◴[] No.44980180[source]
Provenance matters. An LLM cannot certify a Developer Certificate of Origin (https://en.wikipedia.org/wiki/Developer_Certificate_of_Origi...) and a developer of integrity cannot certify the DCO for code emitted by an LLM, certainly not an LLM trained on code of unknown provenance. It is well-known that LLMs sometimes produce verbatim or near-verbatim copies of their training data, most of which cannot be used without attribution (and may have more onerous license requirements). It is also well-known that they don't "understand" semantics: they never make changes for the right reason.

We don't yet know how courts will rule on cases like Does v Github (https://githubcopilotlitigation.com/case-updates.html). LLM-based systems are not even capable of practicing clean-room design (https://en.wikipedia.org/wiki/Clean_room_design). For a maintainer to accept code generated by an LLM is to put the entire community at risk, as well as to endorse a power structure that mocks consent.

replies(5): >>44980234 #>>44980300 #>>44980455 #>>44982369 #>>44990599 #
1. jojobas ◴[] No.44980234[source]
There are only so many ways to code quite a few things. My classmate and I once got in trouble in high school for having identical code for one of the tasks at a coding competition, down to variable names and indentation. There is no way he could or would steal my code, and I sure didn't steal his.