Most active commenters

macawfish(14)
nullc(9)
eschaton(9)
KritVutGu(7)
wahnfrieden(6)
otterley(4)
oceanplexian(3)
mattgreenrocks(3)
EarlKing(3)
tsimionescu(3)

Popular/hot comments

>>44976992 #
>>44978050 #
>>44976945 #
>>44977211 #
>>44977972 #
>>44976860 #
>>44977169 #
>>44977273 #
>>44978056 #
>>44978592 #

←back to thread

AI tooling must be disclosed for contributions

(github.com)

1. Waterluvian ◴[21 Aug 25 19:07 UTC] No.44976790[source]▶

>>44976568 (OP) #

I’m not a big AI fan but I do see it as just another tool in your toolbox. I wouldn’t really care how someone got to the end result that is a PR.

But I also think that if a maintainer asks you to jump before submitting a PR, you politely ask, “how high?”

replies(16): >>44976860 #>>44976869 #>>44976945 #>>44977015 #>>44977025 #>>44977121 #>>44977142 #>>44977241 #>>44977503 #>>44978050 #>>44978116 #>>44978159 #>>44978240 #>>44978311 #>>44978533 #>>44979437 #

2. quotemstr ◴[21 Aug 25 19:14 UTC] No.44976860[source]▶

>>44976790 (TP) #

As a project maintainer, you shouldn't make rules unenforceable rules that you and everyone else know people will flout. Doing so comes makes you seem impotent and diminishes the respect people have for rules in general.

You might argue that by making rules, even futile ones, you at least establish expectations and take a moral stance. Well, you can make a statement without dressing it up as a rule. But you don't get to be sanctimonious that way I guess.

replies(3): >>44976916 #>>44977208 #>>44977384 #

3. wahnfrieden ◴[21 Aug 25 19:15 UTC] No.44976869[source]▶

>>44976790 (TP) #

You should care. If someone submits a huge PR, you’re going to waste time asking questions and comprehending their intentions if the answer is that they don’t know either. If you know it’s generated and they haven’t reviewed it themselves, you can decide to shove it back into an LLM for next steps rather than expect the contributor to be able to do anything with your review feedback.

Unreviewed generated PRs can still be helpful starting points for further LLM work if they achieve desired results. But close reading with consideration of authorial intent, giving detailed comments, and asking questions from someone who didn't write or read the code is a waste of your time.

That's why we need to know if a contribution was generated or not.

replies(2): >>44977332 #>>44978112 #

4. voxl ◴[21 Aug 25 19:21 UTC] No.44976916[source]▶

>>44976860 #

Except you can enforce this rule some of the time. People discover that AI was used or suspect it all the time, and people admit to it after some pressure all the time.

Not every time, but sometimes. The threat of being caught isn't meaningless. You can decide not to play in someone else's walled garden if you want but the least you can do is respect their rules, bare minimum of human decency.

replies(2): >>44976992 #>>44978014 #

5. cvoss ◴[21 Aug 25 19:24 UTC] No.44976945[source]▶

>>44976790 (TP) #

It does matter how and where a PR comes from, because reviewers are fallible and finite, so trust enters the equation inevitably. You must ask "Do I trust where this came from?" And to answer that, you need to know where it come from.

If trust didn't matter, there wouldn't have been a need for the Linux Kernel team to ban the University of Minnesota for attempting to intentionally smuggle bugs through the PR process as part of an unauthorized social experiment. As it stands, if you / your PRs can't be trusted, they should not even be admitted to the review process.

replies(4): >>44977169 #>>44977263 #>>44978862 #>>44979553 #

6. quotemstr ◴[21 Aug 25 19:28 UTC] No.44976992{3}[source]▶

>>44976916 #

It. doesn't. matter.

The only legitimate reason to make a rule is to produce some outcome. If your rule does not result in that outcome, of what use is the rule?

Will this rule result in people disclosing "AI" (whatever that means) contributions? Will it mitigate some kind of risk to the project? Will it lighten maintainer load?

No. It can't. People are going to use the tools anyway. You can't tell. You can't stop them. The only outcome you'll get out of a rule like this is making people incrementally less honest.

replies(6): >>44977031 #>>44977060 #>>44977120 #>>44977333 #>>44978001 #>>44978544 #

7. renrutal ◴[21 Aug 25 19:31 UTC] No.44977015[source]▶

>>44976790 (TP) #

I won't put it as "just another tool". AI introduces a new kind of tool where the ownership of the resulting code is not straightforward.

If, in the dystopian future, a justice court you're subjected to decides that Claude was trained on Oracle's code, and all Claude users are possibly in breach of copyright, it's easier to nuke from orbit all disclosed AI contributions.

8. raincole ◴[21 Aug 25 19:31 UTC] No.44977025[source]▶

>>44976790 (TP) #

When one side has much more "scalability" than the other, then the other side has very strong motivation to match up.

- People use AI to write cover letters. If the companies don't filter out them automatically, they're screwed.

- Companies use AI to interview candidates. No one wants to spend their personal time talking to a robot. So the candidates start using AI to take interviews for them.

etc.

If you don't at least tell yourself that you don't allow AI PRs (even just as a white lie) you'll one day use AI to review PRs.

replies(1): >>44977211 #

9. recursive ◴[21 Aug 25 19:32 UTC] No.44977031{4}[source]▶

>>44976992 #

Sometimes you can tell.

10. blaufuchs ◴[21 Aug 25 19:35 UTC] No.44977060{4}[source]▶

>>44976992 #

> Will it lighten maintainer load?

Yes that is the stated purpose, did you read the linked GitHub comment? The author lays out their points pretty well, you sound unreasonably upset about this. Are you submitting a lot of AI slop PRs or something?

P.S Talking. Like. This. Is. Really. Ineffective. It. Makes. Me. Just. Want. To. Disregard. Your. Point. Out. Of. Hand.

11. devmor ◴[21 Aug 25 19:40 UTC] No.44977120{4}[source]▶

>>44976992 #

There are plenty of argumentative and opinionated reasons to say it matters, but there is one that can't really be denied - reviewers (and project maintainers, even if they aren't reviewers) are people whose time deserves to be respected.

If this rule discourages low quality PRs or allows reviewers to save time by prioritizing some non-AI-generated PRs, then it certainly seems useful in my opinion.

12. nosignono ◴[21 Aug 25 19:40 UTC] No.44977121[source]▶

>>44976790 (TP) #

> I wouldn’t really care how someone got to the end result that is a PR.

I can generate 1,000 PRs today against an open source project using AI. I think you do care, you are only thinking about the happy path where someone uses a little AI to draft a well constructed PR.

There's a lot ways AI can be used to quickly overwhelm a project maintainer.

replies(2): >>44977143 #>>44977273 #

13. Razengan ◴[21 Aug 25 19:42 UTC] No.44977142[source]▶

>>44976790 (TP) #

> if a maintainer asks you to jump before submitting a PR, you politely ask, “how high?”

or say "fork you."

14. Waterluvian ◴[21 Aug 25 19:42 UTC] No.44977143[source]▶

>>44977121 #

In that case a more correct rule (and probably one that can be automatically enforced) for that issue is a max number of PRs or opened issues per account.

replies(1): >>44979575 #

15. koolba ◴[21 Aug 25 19:45 UTC] No.44977169[source]▶

>>44976945 #

> You must ask "Do I trust where this came from?" And to answer that, you need to know where it come from.

No you don’t. You can’t outsource trust determinations. Especially to the people you claim not to trust!

You make the judgement call by looking at the code and your known history of the contributor.

Nobody cares if contributors use an LLM or a magnetic needle to generate code. They care if bad code gets introduced or bad patches waste reviewers’ time.

replies(3): >>44977245 #>>44977531 #>>44978479 #

16. natrius ◴[21 Aug 25 19:48 UTC] No.44977208[source]▶

>>44976860 #

Unenforceable rules are bad, but if you tweak the rule to always require some sort of authorship statement (e.g. "I wrote this by hand" or "I wrote this with Claude"), then the honor system will mostly achieve the desired goal of calibrating code review effort.

17. oceanplexian ◴[21 Aug 25 19:49 UTC] No.44977211[source]▶

>>44977025 #

Both sides will use AI and it will ultimately increase economic productivity.

Imagine living before the invention of the printing press, and then lamenting that we should ban them because it makes it "too easy" to distribute information and will enable "low quality" publications to have more reach. Actually, this exact thing happened, but the end result was it massively disrupted the world and economy in extremely positive ways.

replies(4): >>44977285 #>>44977628 #>>44977916 #>>44978127 #

18. dsjoerg ◴[21 Aug 25 19:51 UTC] No.44977241[source]▶

>>44976790 (TP) #

You haven't addressed the primary stated rationale from the linked content: "I try to assist inexperienced contributors and coach them to the finish line, because getting a PR accepted is an achievement to be proud of. But if it's just an AI on the other side, I don't need to put in this effort, and it's rude to trick me into doing so."

19. falcor84 ◴[21 Aug 25 19:51 UTC] No.44977245{3}[source]▶

>>44977169 #

Trust is absolutely a thing. Maintaining an open source project is an unreasonably demanding and thankless job, and it would be even more so if you had to treat every single PR as if it's a high likelihood supply-chain attack.

replies(1): >>44977696 #

20. oceanplexian ◴[21 Aug 25 19:54 UTC] No.44977273[source]▶

>>44977121 #

> I can generate 1,000 PRs today against an open source project using AI.

Then perhaps the way you contribute, review, and accept code is fundamentally wrong and needs to change with the times.

It may be that technologies like Github PRs and other VCS patterns are literally obsolete. We've done this before throughout many cycles of technology, and these are the questions we need to ask ourselves as engineers, not stick our heads in the sand and pretend it's 2019.

replies(3): >>44977407 #>>44977443 #>>44978474 #

21. bootsmann ◴[21 Aug 25 19:54 UTC] No.44977285{3}[source]▶

>>44977211 #

> Both sides will use AI and it will ultimately increase economic productivity.

Citation needed, I don’t think the printing press and gpt are in any way comparable.

replies(2): >>44977513 #>>44978524 #

22. KritVutGu ◴[21 Aug 25 19:59 UTC] No.44977332[source]▶

>>44976869 #

You are absolutely right. AI is just a tool to DDoS maintainers.

Any contributor who was shown to post provably untested patches used to lose credibility. And now we're talking about accommodating people who don't even understand how the patch is supposed to work?

replies(1): >>44977852 #

23. KritVutGu ◴[21 Aug 25 20:04 UTC] No.44977384[source]▶

>>44976860 #

> As a project maintainer, you shouldn't make rules unenforceable rules

Total bullshit. It's totally fine to declare intent.

You are already incapable of verifying / enforcing that a contributor is legally permitted to submit a piece of code as their own creation (Signed-off-by), and do so under the project's license. You won't embark on looking for prior art, for the "actual origin" of the code, whatever. You just make them promise, and then take their word for it.

replies(1): >>44978573 #

24. whatevertrevor ◴[21 Aug 25 20:06 UTC] No.44977407{3}[source]▶

>>44977273 #

I don't think throwing out the concept of code reviews and version control is the correct response to a purported rise in low-effort high-volume patches. If anything it's even more required.

replies(1): >>44978087 #

25. kelvinjps10 ◴[21 Aug 25 20:10 UTC] No.44977443{3}[source]▶

>>44977273 #

Why it's incorrect? And what would be the new way? AI to review the changes of AI?

replies(1): >>44979772 #

26. ToucanLoucan ◴[21 Aug 25 20:10 UTC] No.44977445{3}[source]▶

>>44977263 #

The sheer amount of entitlement on display by very pro-AI people genuinely boggles the mind.

replies(1): >>44977972 #

27. alfalfasprout ◴[21 Aug 25 20:14 UTC] No.44977503[source]▶

>>44976790 (TP) #

The reality is as someone that helps maintain several OSS projects you vastly underestimate the problem that AI-assisted tooling has created.

On the one hand, it's lowered the barrier to entry for certain types of contributions. But on the other hand getting a vibe-coded 1k LOC diff from someone that has absolutely no idea how the project even works is a serious problem because the iteration cycle of getting feedback + correctly implementing it is far worse in this case.

Also, the types of errors introduced tend to be quite different between humans and AI tools.

It's a small ask but a useful one to disclose how AI was used.

28. alfalfasprout ◴[21 Aug 25 20:15 UTC] No.44977513{4}[source]▶

>>44977285 #

The mental gymnastics the parent poster went through to equate an LLM to the printing press in this sense are mind-boggling.

replies(1): >>44978482 #

29. geraneum ◴[21 Aug 25 20:17 UTC] No.44977531{3}[source]▶

>>44977169 #

> Nobody cares if contributors use an LLM or a magnetic needle to generate code.

That’s exactly opposite of what the author is saying. He mentions that [if the code is not good, or you are a beginner] he will help you get to finish line, but if it’s LLM code, he shouldn’t be putting effort because there’s no human on the other side.

It makes sense to me.

replies(1): >>44978344 #

30. ionelaipatioaei ◴[21 Aug 25 20:24 UTC] No.44977628{3}[source]▶

>>44977211 #

> Both sides will use AI and it will ultimately increase economic productivity.

In some cases sure but it can also create the situation where people just waste time for nothing (think AI interviewing other AIs - this might generate GDP by people purchasing those services but I think we can all agree that this scenario is just wasting time and resource without improving society).

31. fnimick ◴[21 Aug 25 20:31 UTC] No.44977696{4}[source]▶

>>44977245 #

While true, we really should be treating every single piece of external code as though it's malicious.

replies(1): >>44978629 #

32. wahnfrieden ◴[21 Aug 25 20:45 UTC] No.44977852{3}[source]▶

>>44977332 #

That’s not what I said though. LLM output, even unreviewed and without understanding, can be a useful artifact. I do it all the time - generate code, try running it, and then if I see it works well, I can decide to review it and follow up with necessary refactoring before integrating it. Parts of that can be contributed too. We’re just learning new etiquettes for doing that productively, and that does includes testing the PR btw (even if the code itself is not understood or reviewed).

Example where this kind of contribution was accepted and valuable, inside this ghostty project https://x.com/mitchellh/status/1957930725996654718

replies(1): >>44978129 #

33. jrflowers ◴[21 Aug 25 20:50 UTC] No.44977916{3}[source]▶

>>44977211 #

> Imagine living before the invention of the printing press, and then lamenting that we should ban them because it makes it "too easy" to distribute information

Imagine seeing “rm -rf / is a function that returns “Hello World!” and thinking “this is the same thing as the printing press”

https://bsky.app/profile/lookitup.baby/post/3lu2bpbupqc2f

34. mattgreenrocks ◴[21 Aug 25 20:55 UTC] No.44977972{4}[source]▶

>>44977445 #

They genuinely believe their use of chatbots is equivalent to multiple years of production experience in a language. They want to erase that distinction (“democratize”) so they can have the same privileges and status without the work.

Otherwise, what’s the harm in saying AI guides you to the solution if you can attest to it being a good solution?

replies(4): >>44977994 #>>44978056 #>>44978461 #>>45005379 #

35. ToucanLoucan ◴[21 Aug 25 20:57 UTC] No.44977994{5}[source]▶

>>44977972 #

I guess it's just different kinds of people. I have used Copilot to generate code I barely understand (stuff for a microcontroller project, nothing important) but I wouldn't in a thousand years say I wrote it. I broadly understand how it works, and like, if someone wanted to see it, I'd show them. But like... how can you take pride in something you didn't make?

replies(1): >>44978053 #

36. nullc ◴[21 Aug 25 20:57 UTC] No.44978001{4}[source]▶

>>44976992 #

The utility of the rule is so that you can cheaply nuke non-conforming contributors from orbit when you detect their undisclosed AI use. Vs having to deal with the flood of low quality contributions on a individually reviewed basis.

37. pixl97 ◴[21 Aug 25 20:58 UTC] No.44978014{3}[source]▶

>>44976916 #

Except the other way happens too.

You get someone that didn't use AI getting accused of using AI and eventually telling people to screw off and contributing nothing.

replies(1): >>44981482 #

38. armchairhacker ◴[21 Aug 25 21:01 UTC] No.44978050[source]▶

>>44976790 (TP) #

Agreed. As someone who uses AI (completion and Claude Code), I'll disclose whenever asked. But I disagree that it's "common courtesy" when not explicitly asked; since many people (including myself) don't mind and probably assume some AI, and it adds distraction (another useless small indicator; vaguely like dependabot, in that it steals my attention but ultimately I don't care).

replies(5): >>44978496 #>>44978872 #>>44979119 #>>44979194 #>>44981592 #

39. mattgreenrocks ◴[21 Aug 25 21:02 UTC] No.44978053{6}[source]▶

>>44977994 #

Not making the thing is a point in favor of LLMs for some of these people I suspect. So the pride in work thing is just not high on the list of incentives.

I don’t get it at all. Feels like modernity is often times just inventing pale shadows of things with more addictive hooks to induce needlessly dependent behavior.

replies(1): >>45005303 #

40. macawfish ◴[21 Aug 25 21:02 UTC] No.44978056{5}[source]▶

>>44977972 #

That's just not true. I have 20 years of dev experience and also am using these tools. I won't commit slop. I'm open to being transparent about my usage of AI but tbh right now there's so much bias and vitriol coming from people afraid of these new tools that in the haze of their fear I don't trust people will actually take the time to neutrally determine whether or not the code is actually slop. I've had manually written, well thought through, well conceived, rough around the edges code get called "AI slop" by a colleague (who I very much respect and have a good relationship with) who admittedly hadn't had a chance to thoroughly understand the code yet.

If I just vibe-coded something and haven't looked at the code myself, that seems like a necessary thing to disclose. But beyond that, if the code is well understood and solid, I feel that I'd be clouding the conversation by unnecessarily bringing the tools I used into it. If I understand the code and feel confident in it, whether I used AI or not seems irrelevant and distracting.

This policy is just shoving the real problem under the rug. Generative AI is going to require us to come up with better curation/filtering/selection tooling, in general. This heuristic of "whether or not someone self-disclosed using LLMs" just doesn't seem very useful in the long run. Maybe it's a piece of the puzzle but I'm pretty sure there are more useful ways to sift through PRs than that. Line count differences, for example. Whether it was a person with an LLM or a 10x coder without one, a PR that adds 15000 lines is just not likely to be it.

replies(3): >>44978392 #>>44978442 #>>44978444 #

41. oblio ◴[21 Aug 25 21:06 UTC] No.44978087{4}[source]▶

>>44977407 #

Heck, let's throw out QA, too :-))

42. nullc ◴[21 Aug 25 21:08 UTC] No.44978112[source]▶

>>44976869 #

> is that they don’t know either

It would be nice if they did, in fact, say they didn't know. But more often they just waste your time making their chatbot argue with you. And the chatbots are outrageous gaslighters.

All big OSS projects have had the occasional bullshitter/gaslighter show up. But LLMs have increased the incidence level of these sorts of contributors by many orders of magnitude-- I consider it an open question if open-public-contribution opensource is viable in the world post LLM.

replies(1): >>44980839 #

43. bagels ◴[21 Aug 25 21:08 UTC] No.44978116[source]▶

>>44976790 (TP) #

You have other choices, such as not contributing.

44. rangerelf ◴[21 Aug 25 21:09 UTC] No.44978127{3}[source]▶

>>44977211 #

Imagine being so deluded and disconnected that you actually believe that AI has any similarity with the printing press regarding the benefits to The People.

45. nullc ◴[21 Aug 25 21:09 UTC] No.44978129{4}[source]▶

>>44977852 #

If the AI slop was that valuable a project regular, who actually knows and understands the project, would be just as capable of asking the AI to produce it.

replies(1): >>44980012 #

46. sheepscreek ◴[21 Aug 25 21:12 UTC] No.44978159[source]▶

>>44976790 (TP) #

We keep talking about “AI replacing coders,” but the real shift might be that coding itself stops looking like coding. If prompts become the de facto way to create applications/developing systems in the future, maybe programming languages will just be baggage we’ll need to unlearn.

Programming languages were a nice abstraction to accommodate our inability to comprehend complexity - current day LLMs do not have the same limitations as us.

The uncomfortable part will be what happens to PRs and other human-in-the-loop checks. It’s worthwhile to consider that not too far into the future, we might not be debugging code anymore - we’ll be debugging the AI itself. That’s a whole different problem space that will need an entirely new class of solutions and tools.

replies(2): >>44978213 #>>44978727 #

47. ryoshu ◴[21 Aug 25 21:17 UTC] No.44978213[source]▶

>>44978159 #

All we need to do is prompt an LLM with such specificity that it does exactly what we want the machine to do.

replies(1): >>44980878 #

48. EarlKing ◴[21 Aug 25 21:20 UTC] No.44978240[source]▶

>>44976790 (TP) #

It's not just about how you got there. At least in the United States according to the Copyright Office... materials produced by artificial intelligence are not eligible for copyright. So, yeah, some people want to know for licensing purposes. I don't think that's the case here, but it is yet another reason to require that kind of disclosure... since if you fail to mention that something was made by AI as part of a compound work you could end up losing copyright over the whole thing. For more details, see [2] (which is part of the larger report on Copyright and AI at [1]).

[1] https://www.copyright.gov/ai/

[2] https://www.copyright.gov/ai/Copyright-and-Artificial-Intell...

replies(2): >>44979135 #>>44979585 #

49. macawfish ◴[21 Aug 25 21:32 UTC] No.44978344{4}[source]▶

>>44977531 #

"but if it’s LLM code, he shouldn’t be putting effort because there’s no human on the other side"

That's the false equivalence right there

replies(1): >>44978651 #

50. eschaton ◴[21 Aug 25 21:37 UTC] No.44978392{6}[source]▶

>>44978056 #

You should not just be open to being transparent, you need to understand that there will be times you will be required to be transparent about the tools you’ve used and the ultimate origin of your contributions, and that trying to evade or even push back against it is a huge red flag that you cannot be trusted to abide by your commitments.

If you’re unwilling to stop using slop tools, then you don’t get to contribute to some projects, and you need to be accept that.

replies(2): >>44978632 #>>44978751 #

51. moron4hire ◴[21 Aug 25 21:41 UTC] No.44978432{3}[source]▶

>>44977263 #

I think this is an excellent example of how software is different from everything else that is copyright-able. If you look at how GenAI is being applied to "the arts" and how completely destructive it is to the visual mediums, it is clearly a completely different beast to code. "The artists" of visual art don't want AI trained on their work and competing with their work. I completely understand. The point of art is to be a personal interaction between artist and consumer. But, even though I'm quite skeptical of AI code gen on a practical standpoint, it really doesn't feel like the same existential threat.

replies(1): >>45006528 #

52. gizmo686 ◴[21 Aug 25 21:43 UTC] No.44978442{6}[source]▶

>>44978056 #

> I've had manually written, well thought through, well conceived, rough around the edges code get called "AI slop" by a colleague (who I very much respect and have a good relationship with) who admittedly hadn't had a chance to thoroughly understand the code yet.

This is the core problem with AI that makes so many people upset. In the old days, if you get a substantial submission, you know a substantial amount of effort went into it. You know that someone at some point had a mental model of what the submission was. Even if they didn't translate that perfectly, you can still try to figure out what they meant and we're thinking. You know the submitter put forth significant effort. That is a real signal that they are both willing and able to do so to address going forward to address issues you raise.

The existence of AI slop fundamentally breaks these assumptions. That is why we need enforced social norms around disclosure.

replies(2): >>44978592 #>>45005475 #

53. mattgreenrocks ◴[21 Aug 25 21:43 UTC] No.44978444{6}[source]▶

>>44978056 #

I get it. But it’s ultimately up to the maintainer here.

replies(1): >>44978556 #

54. throwawaybob420 ◴[21 Aug 25 21:45 UTC] No.44978461{5}[source]▶

>>44977972 #

Democratize X via boiling our oceans!!

replies(1): >>44979507 #

55. ivanche ◴[21 Aug 25 21:46 UTC] No.44978474{3}[source]▶

>>44977273 #

You're free to invent a better way, popularize it and become a millionaire.

56. eschaton ◴[21 Aug 25 21:47 UTC] No.44978479{3}[source]▶

>>44977169 #

You’re completely incorrect. People care a lot about where code came from. They need to be able to trust that code you’re contributing was not copied from a project under AGPLv3, if the project you’re contributing to is under a different license.

Stop trying to equate LLM-generated code with indexing-based autocomplete. They’re not the same thing at all: LLM-generated code is equivalent to code copied off Stack Overflow, which is also something you’d better not be attempting to fraudulently pass off as your own work.

replies(2): >>44979814 #>>44981058 #

57. eks391 ◴[21 Aug 25 21:48 UTC] No.44978482{5}[source]▶

>>44977513 #

Ironically, I thought your parent commenter had to go through mental gymnastics to say that their parents analogy of the printing press isn't applicable to an LLM. Neither you nor your parent gave me any satisfactory reasons why they aren't similar, just your mental superiority as proof that oceanplexian must be wrong.

replies(1): >>44979467 #

58. eschaton ◴[21 Aug 25 21:49 UTC] No.44978496[source]▶

>>44978050 #

It’s not just common courtesy to disclose, it’s outright fraud not to disclose.

replies(1): >>44982414 #

59. macawfish ◴[21 Aug 25 21:52 UTC] No.44978524{4}[source]▶

>>44977285 #

GPT and compilers are though.

replies(1): >>44982165 #

60. tgsovlerkhgsel ◴[21 Aug 25 21:53 UTC] No.44978533[source]▶

>>44976790 (TP) #

If a maintainer asks me to jump through too many stupid hoops, I'll just not contribute to the software.

That said, requiring adequate disclosure of AI is just fair. It also suggests that the other side is willing to accept AI-supported contributions (without being willing to review endless AI slop that they could have generated themselves if they had the time to read it).

I would expect such a maintainer to respond fairly to "I first vibecoded it. I then made manual changes, vibecoded a test, cursorily reviewed the code, checked that the tests provide good coverage, ran both existing and new tests, and manually tested the code."

That fair response might be a thorough review, or a request that I do the thorough review before they put in the time, but I'd expect it to be more than a blatant "nope, AI touched this, go away".

61. eschaton ◴[21 Aug 25 21:54 UTC] No.44978544{4}[source]▶

>>44976992 #

You’re basically saying “if a rule can be broken, it will be, therefore rules are useless.”

If someone really wants to commit fraud they’re going to commit fraud. (For example, by not disclosing AI use when a repository requires it.) But if their fraud is discovered, they can still be punished for it, and mitigating actions taken. That’s not nothing, and does actually do a lot to prevent people from engaging in such fraud in the first place.

62. macawfish ◴[21 Aug 25 21:55 UTC] No.44978556{7}[source]▶

>>44978444 #

Of course. I don't actually think the maintainer guidelines here are entirely unreasonable, even if I'd personally modify them to reduce reactive panic.

My little essay up there is more so a response to the heated "LLM people vs pure people" comments I'm reading all over this discussion. Some of this stuff just seems entirely misguided and fear driven.

63. eschaton ◴[21 Aug 25 21:56 UTC] No.44978573{3}[source]▶

>>44977384 #

And if they’re discovered to not be keeping their word, there can be consequences imposed and mitigating actions taken. Rules can’t prevent bad actions 100% of the time, but they can substantially increase the risk of bad actions.

64. macawfish ◴[21 Aug 25 21:58 UTC] No.44978592{7}[source]▶

>>44978442 #

We need better social norms about disclosure, but maybe those don't need to be about "whether or not you used LLMs" and might have more to do with "how well you understand the code you are opening a PR for" (or are reviewing, for that matter). Normalize draft PRs and sharing big messes of code you're not quite sure about but want to start a conversation about. Normalize admitting that you don't fully understand the code you've written / are tasked with reviewing and that this is perfectly fine and doesn't reflect poorly on you at all, in fact it reflects humility and a collaborative spirit.

10x engineers create so many bugs without AI, and vibe coding could multiply that to 100x. But let's not distract from the source of that, which is rewarding the false confidence it takes to pretend we understand stuff that we actually don't.

replies(3): >>44978880 #>>44979072 #>>45005522 #

65. tsimionescu ◴[21 Aug 25 22:01 UTC] No.44978629{5}[source]▶

>>44977696 #

No, we shouldn't. We live in a society, and that level of distrust is not just unrealistic, it's disastrous. This doesn't mean you should share your house keys with every drive by PR contributor, but neither should you treat every PR as if it's coming from Jia Tan.

66. macawfish ◴[21 Aug 25 22:02 UTC] No.44978632{7}[source]▶

>>44978392 #

Your blanket determination that the tools themselves are slop generators is an attitude I'm definitely not interested in engaging with in collaboration.

replies(1): >>44978669 #

67. tsimionescu ◴[21 Aug 25 22:04 UTC] No.44978651{5}[source]▶

>>44978344 #

It's not a false equivalence. You can teach a beginner to become an intermediate (and later a master, if they stick to it). You can't teach an LLM to be better. Every piece of feedback you give to an LLM is like screaming into the void - it wastes your time, and doesn't change the LLM one iota.

replies(1): >>44978685 #

68. macawfish ◴[21 Aug 25 22:07 UTC] No.44978685{6}[source]▶

>>44978651 #

"Every piece of feedback you give to an LLM is like screaming into the void - it wastes your time, and doesn't change the LLM one iota."

I think you just haven't gotten the hang of it yet, which is fine... the tooling is very immature and hard to get consistent results with. But this isn't a given. Some people do get good, steerable LLM coding setups.

replies(2): >>44979402 #>>44979604 #

69. tsimionescu ◴[21 Aug 25 22:12 UTC] No.44978727[source]▶

>>44978159 #

This fundamentally misunderstands why programming languages exist. They're not required because "we can't understand complexity". They were invented because we need a way to be very specific about what we want the machine to do. Whether it's the actual physical hardware we're talking to when writing assembly, or it's an abstract machine that will be translated to the hardware like in C or Java, the key point is that we want to be specific.

Natural language can be specific, but it requires far too many words. `map (+ 1) xs` is far shorter to write than "return a list of elements by applying a function that adds one to its argument to each element of xs and collecting the results in a separate list", or similar.

replies(1): >>44991912 #

70. ikiris ◴[21 Aug 25 22:15 UTC] No.44978751{7}[source]▶

>>44978392 #

What tools did you use to generate this response? Please include the make and model of the devices, and what os and browser you were running, including all browser plugins and other code which had screen access at that time.

replies(1): >>44978785 #

71. macawfish ◴[21 Aug 25 22:16 UTC] No.44978754{9}[source]▶

>>44978669 #

I'd way sooner quit my job / not contribute than deal with someone who projects on me the way you have in this conversation.

replies(1): >>44978790 #

72. eschaton ◴[21 Aug 25 22:20 UTC] No.44978790{10}[source]▶

>>44978754 #

Enjoy being found out for fraudulently passing off work you didn’t do as your own then.

replies(1): >>44978855 #

73. macawfish ◴[21 Aug 25 22:27 UTC] No.44978855{11}[source]▶

>>44978790 #

Ironically as a practice I'm actually quite transparent about how I use LLMs and believe that destigmatising open conversation about use of these tools is actually really important, just not that it's a useful heuristic for whether or not some code is slop.

74. otterley ◴[21 Aug 25 22:27 UTC] No.44978862[source]▶

>>44976945 #

If it comes with good documentation and appropriate tests, does that help?

replies(2): >>44979254 #>>44979456 #

75. mtlmtlmtlmtl ◴[21 Aug 25 22:29 UTC] No.44978872[source]▶

>>44978050 #

The reason it's common courtesy is out of respect for the reviewer/maintainer's time. You need to let em know to look for the kind of idiotic mistakes LLMs shit out on a routine basis. It's not a "distraction", it's extremely relevant information. On the maintainer's discretion, they may not want to waste their time reviewing it at all, and politely or impolitely ask the contributor to do it again, and use their own brain this time. It also informs them on how seriously to take this contributor in the future, if the work doesn't hold water, or indeed, even if it does, since the next time the contributor runs the LLM lottery the result may be utter bullshit.

Whether it's prose or code, when informed something is entirely or partially AI generated, it completely changes the way I read it. I have to question every part of it now, no matter how intuitive or "no one could get this wrong"ish it might seem. And when I do, I usually find a multitude of minor or major problems. Doesn't matter how "state of the art" the LLM that shat it out was. They're still there. The only thing that ever changed in my experience is that problems become trickier to spot. Because these things are bullshit generators. All they're getting better at is disguising the bullshit.

I'm sure I'll gets lots of responses trying to nitpick my comment apart. "You're holding it wrong", bla bla bla. I really don't care anymore. Don't waste your time. I won't engage with any of it.

I used to think it was undeserved that we programmers called ourselved "engineers" and "architects" even before LLMs. At this point, it's completely farcical.

"Gee, why would I volunteer that my work came from a bullshit generator? How is that relevant to anything?" What a world.

76. inferiorhuman ◴[21 Aug 25 22:29 UTC] No.44978880{8}[source]▶

>>44978592 #

  but maybe those don't need to be about "whether or not you used LLMs" and might have more to do
  with "how well you understand the code you are opening a PR for" (or are reviewing, for that matter)

AI is a great proxy for how much someone has. If you're writing a PR you're demonstrating some manner of understanding. If you're submitting AI slop you're not.

replies(1): >>44978961 #

77. macawfish ◴[21 Aug 25 22:38 UTC] No.44978961{9}[source]▶

>>44978880 #

I've worked with 10x developers who committed a lot of features and a lot of bugs, and who got lots of accolades for all their green squares. They did not use LLM dev tools because those didn't exist then.

If they had used AI, their PRs might have been more understandable / less buggy, and ultimately I would have preferred that.

replies(1): >>44980594 #

78. risyachka ◴[21 Aug 25 22:47 UTC] No.44979072{8}[source]▶

>>44978592 #

>> but maybe those don't need to be about "whether or not you used LLMs"

The only reason one may not want disclosure is if one can’t write anything by themselves, thus they will have to label all code as AI generated and everyone will see their real skill level.

79. risyachka ◴[21 Aug 25 22:53 UTC] No.44979119[source]▶

>>44978050 #

It should be. You didn’t write generated code, why should I spend my life reading it?

If you want me to put in the effort- you have to put it in first.

Especially considering in 99% of cases even the one who generated it didn’t fully read/understand it.

replies(1): >>44979876 #

80. ants_everywhere ◴[21 Aug 25 22:54 UTC] No.44979135[source]▶

>>44978240 #

> • The use of AI tools to assist rather than stand in for human creativity does not affect the availability of copyright protection for the output.

> • Copyright protects the original expression in a work created by a human author, even if the work also includes AI-generated material

> • Human authors are entitled to copyright in their works of authorship that are perceptible in AI-generated outputs, as well as the creative selection, coordination, or arrangement of material in the outputs, or creative modifications of the outputs.

replies(1): >>44984632 #

81. ants_everywhere ◴[21 Aug 25 23:02 UTC] No.44979194[source]▶

>>44978050 #

If you don't disclose the use of

- books

- search engines

- stack overflow

- talking to a coworker

then it's not clear why you would have to disclose talking to an AI.

Generally speaking, when someone uses the word "slop" when talking about AI it's a signal to me that they've been sucked into a culture war and to discount what they say about AI.

It's of course the maintainer's right to take part in a culture war, but it's a useful way to filter out who's paying attention vs who's playing for a team. Like when you meet someone at a party and they bring up some politician you've barely heard of but who their team has vilified.

replies(2): >>44979524 #>>44982869 #

82. mattbee ◴[21 Aug 25 23:08 UTC] No.44979254{3}[source]▶

>>44978862 #

The observation that inspired this policy is that if you used AI, it is likely you don't know if the code, the documentation or tests are good or appropriate.

replies(1): >>44979322 #

83. otterley ◴[21 Aug 25 23:17 UTC] No.44979322{4}[source]▶

>>44979254 #

What if you started with good documentation that you personally wrote, you gave that to the agent, and you verified the tests were appropriate and passed?

replies(1): >>44979539 #

84. sho_hn ◴[21 Aug 25 23:26 UTC] No.44979402{7}[source]▶

>>44978685 #

Steering via prompting isn't the same as fundamentally changing the LLM by teaching, as you can do with humans. I think OP understands this better than you.

replies(1): >>44979589 #

85. bandrami ◴[21 Aug 25 23:30 UTC] No.44979437[source]▶

>>44976790 (TP) #

Whether the output of AI can be copyrighted remains a legal minefield, so if I were running a project where copyright-based protections are important (say, anything GPL) I would want to know if a PR contained them.

86. explorigin ◴[21 Aug 25 23:32 UTC] No.44979456{3}[source]▶

>>44978862 #

I suppose it depends if AI is writing the tests an documentation.

87. macawfish ◴[21 Aug 25 23:38 UTC] No.44979507{6}[source]▶

>>44978461 #

https://youtu.be/klW65MWJ1PY?t=3234

https://youtu.be/klW65MWJ1PY?t=1320

X sucks and should not be allowed to proceed with what they're doing in Memphis. Nor should Meta be allowed to proceed with multiple Manhattan sized data centers.

88. latexr ◴[21 Aug 25 23:40 UTC] No.44979524{3}[source]▶

>>44979194 #

> then it's not clear why you would have to disclose talking to an AI.

It’s explained right there in the PR:

> The disclosure is to help maintainers assess how much attention to give a PR. While we aren't obligated to in any way, I try to assist inexperienced contributors and coach them to the finish line, because getting a PR accepted is an achievement to be proud of. But if it's just an AI on the other side, I don't need to put in this effort, and it's rude to trick me into doing so.

That is not true of books, search engines, stack overflow, or talking to a worker, because in all those cases you still had to do the work yourself of comprehending, preparing, and submitting the patch. This is also why they ask for a disclosure of “the extent to which AI assistance was used”. What about that isn’t clear to you?

replies(1): >>44979768 #

89. mattbee ◴[21 Aug 25 23:41 UTC] No.44979539{5}[source]▶

>>44979322 #

I'd extrapolate that the OP's view would be: you've still put in less effort, so your PR is less worthy of his attention than someone who'd done the same without using LLMs.

That's a pretty nice offer from one of the most famous and accomplished free software maintainers in the world. He's promising not to take a short-cut reviewing your PR, in exchange for you not taking a short-cut writing it in the first place.

replies(1): >>44979731 #

90. therealpygon ◴[21 Aug 25 23:41 UTC] No.44979540{3}[source]▶

>>44977263 #

Being more trusting of people’s code simply because they didn’t use AI seems as naive as distrusting code contributions simply because they were written with the assistance of AI.

It seems a bit like saying you can’t trust a legal document because it was written on a computer with spellcheck, rather than by a $10 an hour temp with a typewriter.

91. RossBencina ◴[21 Aug 25 23:43 UTC] No.44979553[source]▶

>>44976945 #

> "Do I trust where this came from?"

In an open source project I think you have to start with a baseline assumption of "trust nobody." Exceptions possibly if you know the contributors personally, or have built up trust over years of collaboration.

I wouldn't reject or decline to review a PR just because I don't trust the contributor.

replies(1): >>44981560 #

92. RossBencina ◴[21 Aug 25 23:47 UTC] No.44979575{3}[source]▶

>>44977143 #

I think this is sane, although possibly not sufficient. Asking people to self-disclose AI usage is not going shield maintainers from a flood of undisclosed AI submissions.

93. smitop ◴[21 Aug 25 23:49 UTC] No.44979585[source]▶

>>44978240 #

> if you fail to mention that something was made by AI as part of a compound work you could end up losing copyright over the whole thing

The source you linked says the opposite of that: "the inclusion of elements of AI-generated content in a larger human-authored work does not affect the copyrightability of the larger human-authored work as a whole"

replies(2): >>44982922 #>>44984701 #

94. macawfish ◴[21 Aug 25 23:50 UTC] No.44979589{8}[source]▶

>>44979402 #

Can't tell if you're responding in earnest or not here?

LLMs are trained to be steerable at inference time via context/prompting. Fine tuning is also possible and often used. Both count as "feedback" in my book, and my point is that both can be effective at "changing the LLM" in terms of its behavior at inference time.

replies(1): >>44980193 #

95. david_allison ◴[21 Aug 25 23:53 UTC] No.44979604{7}[source]▶

>>44978685 #

As a maintainer, if you're dealing with a contributor who's sending in AI slop, you have no opportunity to prompt the LLM.

The PR effectively ends up being an extremely high-latency conversation with an LLM, via another human who doesn't have the full context/understanding of the problem.

replies(1): >>44980772 #

96. otterley ◴[22 Aug 25 00:12 UTC] No.44979731{6}[source]▶

>>44979539 #

> in exchange for you not taking a short-cut writing it in the first place.

This “short cut” language suggests that the quality of the submission is going to be objectively worse by way of its provenance.

Yet, can one reliably distinguish working and tested code generated by a person vs a machine? We’re well past passing Turing tests at this point.

replies(1): >>44980065 #

97. oceanplexian ◴[22 Aug 25 00:19 UTC] No.44979772{4}[source]▶

>>44977443 #

If machines can iterate faster than humans, we'll need machines to do the reviewing; that means the testing/QA will be done perhaps by machines which will operate on a spec similar to what Amazon is doing with Kilo.

Before PR's existed we passed around code changes via email. Before containers we installed software on bare metal servers. And before search engines we used message boards. It's not unfathomable that the whole idea of how we contribute and collaborate changes as well. Actually that is likely going to be the /least/ shocking thing in the next few years if acceleration happens (i.e. The entire OS is an LLM that renders pixels, for example)

98. koolba ◴[22 Aug 25 00:27 UTC] No.44979814{4}[source]▶

>>44978479 #

I’m not equating any type of code generation. I’m saying that as a maintainer you have to evaluate any submission on the merits, not on a series of yes/no questions provided by the submitter. And your own judgement is influenced by what you know about the submitter.

replies(1): >>44981192 #

99. charcircuit ◴[22 Aug 25 00:38 UTC] No.44979876{3}[source]▶

>>44979119 #

No one is forcing you to read it. Feel free to have your own AI judge if you should merge it or even just YOLO merge it. The end goal of people trying to get code merged is not to have you read it. It's to improve the software. Whether code improves the software or not is orthogonal to if the code was written by hand.

100. wahnfrieden ◴[22 Aug 25 01:00 UTC] No.44980012{5}[source]▶

>>44978129 #

Not according to ghostty maintainer Hashimoto per above.

It takes attempts, verifying the result behaves as desired, and iterative prompting to adjust. And it takes a lot of time to wait on agents in between those steps (this work isn’t a one shot response). You’re being reductive.

replies(1): >>44980154 #

101. mattbee ◴[22 Aug 25 01:09 UTC] No.44980065{7}[source]▶

>>44979731 #

LLMs can't count letters, their writing is boring, and you can trick them into talking gibberish. That is a long way off the Turing test, even if we were fooled for a couple of weeks in 2022.

IMO when people declare that LLMs "pass" at a particular skill, it's a sign that they don't have the taste or experience to judge that skill themselves. Or - when it's CEOs - they have an interest in devaluing it.

So yes if you're trying to fool an experienced open source maintainer with unrefined LLM-generated code, good luck (especially one who's said he doesn't want that).

replies(1): >>44984592 #

102. nullc ◴[22 Aug 25 01:26 UTC] No.44980154{6}[source]▶

>>44980012 #

We may be talking cross purposes. I read the grandparent poster discussing provably untested patches.

I have no clue in ghostty but I've seen plenty of stuff that doesn't compile much less pass tests. And I assert there is nothing but negative value in such "contributions".

If real effort went into it, then maybe there is value-- though it's not clear to me: When a project regular does the same work then at least they know the process. Like if there is some big PR moving things around at least the author knows that it's unlikely to slip in a backdoor. Once the change is reduced to some huge diff, it's much harder to gain this confidence.

In some projects direct PRs for programmatic mass renames and such have been prohibited in favor of requiring submission of the script that produces the change, because its easier to review the script carefully. The same may be necessary for AI.

replies(1): >>44980226 #

103. sho_hn ◴[22 Aug 25 01:34 UTC] No.44980193{9}[source]▶

>>44979589 #

And also clearly not what the OP means, who was trying to make a point that tuning the prompt to an otherwise stateless LLM inference job is nothing at all like teaching a human being. Mechanically, computationally, morally or emotionally. For example, humans aren't just tools; giving feedback to LLMs does little to further their agency.

replies(1): >>44980767 #

104. wahnfrieden ◴[22 Aug 25 01:43 UTC] No.44980226{7}[source]▶

>>44980154 #

This whole original HN post is about ghostty btw

Having the original prompts (in sequence and across potentially multiple models) can be valuable but is not necessarily useful in replicating the results because of the slot machine nature of it

replies(1): >>44980476 #

105. nullc ◴[22 Aug 25 02:33 UTC] No.44980476{8}[source]▶

>>44980226 #

> This whole original HN post is about ghostty btw

Sure though I believe few commenters care much about ghostty specifically and are primarily discussing the policy abstractly!

> because of the slot machine nature of it

One could use deterministically sampled LLMs with exact integer arithmetic... There is nothing fundamental preventing it from being completely reproducible.

replies(1): >>44981041 #

106. inferiorhuman ◴[22 Aug 25 02:57 UTC] No.44980594{10}[source]▶

>>44978961 #

  If they had used AI, their PRs might have been more understandable / less buggy, and ultimately I would have preferred that.

Sure, and if they had used AI pigs could depart my rectum on a Part 121 flight. One has absolutely nothing to do with the other. Submitting AI slop does not demonstrate any knowledge of the code in question even if you do understand the code.

To address your claim about AI slop improving the output of these mythical 10x coders: doubtful. LLMs can only approximate meaningful output if they've already indexed the solution. If your vaunted 10x coders are working on already solved problems you're likely wasting their time. If they're working on something novel LLMs are of little use. For instance: I've had the pleasure of working with a notoriously poorly documented crate that's also got a reputation for frequently making breaking changes. I used DDG and Google to see if I could track down someone with a similar use case. If I forgot to append "-ai" to the query I'd get back absolutely asinine results typically along the line of "here's an answer with rust and one of the words in your query". At best first sentence would explain something entirely unrelated about the crate.

Potentially LLMs could be improved by ingesting more and more data, but that's an arms race they're destined to lose. People are already turning to Cloudflare and Anubis en masse to avoid being billed for training LLMs. If Altman and co. had to pay market rate for their training data nobody could afford to use these AI doodads.

107. macawfish ◴[22 Aug 25 03:33 UTC] No.44980767{10}[source]▶

>>44980193 #

The false equivalence I pointed at earlier was "LLM code => no human on the other side".

The person driving the LLM is a teachable human who can learn what's what's going on and learn to improve the code. It's simply not true that there's no person on the other side of the PR.

The idea that we should be comparing "teaching a human" to "teaching an LLM" is yet another instance of this false equivalence.

It's not inherently pointless to provide feedback on a PR with code written using an LLM, that feedback goes to the person using the LLM tools.

People are swallowing this b.s. marketing mystification of "LLMs as non human entities". But really they're fancy compilers that we have a lot to learn about.

replies(1): >>44981579 #

108. macawfish ◴[22 Aug 25 03:34 UTC] No.44980772{8}[source]▶

>>44979604 #

You're totally dismissing this person's agency and their ability to learn. You're all but writing off their existence.

109. kentm ◴[22 Aug 25 03:46 UTC] No.44980839{3}[source]▶

>>44978112 #

There was some post that comes to mind of an example of this. Some project had a security issue reported that was not a security issue, and when asking questions it became extremely obvious that someone was just feeding the conversation into an LLM. There was no security issue. I can imagine this is happening more and more as people are trying to slam in LLM generated code everywhere.

Everyone promoting LLMs, especially on HN, claim that they're expertly using them by using artisanal prompts and carefully examining the output but.. I'm honestly skeptical. Sure, some people are doing that (I do it from time to time). But I've seen enough slop to think that more people are throwing around code that they barely understand than these advocates care to admit .

Those same people will swear that they did due diligence, but why would they admit otherwise? And do they even know what proper due diligence is? And would they still be getting their mythical 30%-50% productivity boost if they were actually doing what they claimed they were doing?

And that is a problem. I cannot have a productive code review with someone that does not even understand what their code is actually doing, much less trade offs that were made in an implementation (because they did not consider any trade offs at all and just took what the LLM produced). If they can't have a conversation about the code at all because they didn't bother to read or understand anything about it, then theres nothing I can do except close the PR and tell them to actually do the work this time.

replies(1): >>44981053 #

110. kentm ◴[22 Aug 25 03:54 UTC] No.44980878{3}[source]▶

>>44978213 #

Good idea! We can have some sort of standard grammar that we use to prompt the LLM such that it deterministically gives us the result we ask for. We then constrain all prompts to match that grammar. Some sort of language describing programs.

111. wahnfrieden ◴[22 Aug 25 04:39 UTC] No.44981041{9}[source]▶

>>44980476 #

Can't do that with state of the art LLMs and no sign of that changing (as they like to retain control over model behaviors). I would not want to use or contribute to a project that embraces LLMs yet disallows leading models.

Besides, the output of an LLM is not really any more trustworthy (even if reproducible) than the contribution of an anonymous actor. Both require review of outputs. Reproducibility of output from prompt doesn't mean that the output followed a traceable logic such that you can skip a full manual code review as with your mass renaming example. LLMs produce antagonistic output from innocuous prompting from time to time, too.

112. wahnfrieden ◴[22 Aug 25 04:41 UTC] No.44981053{4}[source]▶

>>44980839 #

The ghostty creator disagrees re: the productivity of un-reviewed generated PRs: https://x.com/mitchellh/status/1957930725996654718

113. fluidcruft ◴[22 Aug 25 04:42 UTC] No.44981058{4}[source]▶

>>44978479 #

How does an "I didn't use AI" pledge provide any assurance/provenance that submitted code wasn't copied from an AGPLv3 reference?

replies(1): >>44981229 #

114. eschaton ◴[22 Aug 25 05:10 UTC] No.44981192{5}[source]▶

>>44979814 #

And I’m saying, as a maintainer, you have to and are doing both, even if you don’t think you are.

For example, you either make your contributors attest that their changes are original or that they have the right to contribute their changes—or you assume this of them and consider it implicit in their submission.

What you (probably) don’t do is welcome contributions that the contributors do not have the right to make.

115. eschaton ◴[22 Aug 25 05:16 UTC] No.44981229{5}[source]▶

>>44981058 #

It doesn’t, it provides an assurance (but not provenance) you didn’t use AI.

Assuring you didn’t include any AGPLv3 code in your contribution is exactly the same kind of assurance. It also doesn’t provide any provenance.

Conflating assurance with provenance is bogus because the former is about making a representation that, if false, exposes the person making it to liability. For most situations that’s sufficient that provenance isn’t needed.

116. nullc ◴[22 Aug 25 06:08 UTC] No.44981482{4}[source]▶

>>44978014 #

If their work was difficult to distinguish from AI then that sounds like a win too.

117. nullc ◴[22 Aug 25 06:27 UTC] No.44981560{3}[source]▶

>>44979553 #

Better to think in terms of distrust rather than trust.

Presumably if a contributor repeatedly made bad PRs that didn't do what they said, introduced bugs, scribbled pointlessly on the codebase, and when you tried to coach or clarify at best they later forgot everything you said and at worst outright gaslit and lied to you about their PRs... you would reject or decline to review their PRs, right? You'd presumably ban the outright.

Well that's exactly what commercial LLM products, with the aid of less sophisticated users, have already done to the maintainers of many large open source projects. It's not that they're not trusted-- they should be distrusted with ample cause.

So what if the above banned contributor kept getting other people to mindlessly submit their work and even proxy communication through -- evading your well earned distrust and bans? Asking people to at least disclose that they were acting on behalf of the distrusted contributor would be the least you would do, I hope? Or even asking them to disclose if and to what extent their work was a collaboration with a distrusted contributor?

118. nullc ◴[22 Aug 25 06:31 UTC] No.44981579{11}[source]▶

>>44980767 #

The person operating the LLM is not a meaningfully teachable human when they're not disclosing that they're using an LLM.

IF they disclose what they've done, provided the prompts, etc. then other contributors can help them get better results from the tools. But the feedback is very different than the feedback you'd give a human that actually wrote the code in question, that latter feedback is unlikely to be of much value (and even less likely to persist).

replies(1): >>44982704 #

119. nullc ◴[22 Aug 25 06:35 UTC] No.44981592[source]▶

>>44978050 #

FWIW, I can say from direct experience people that other people are watching and noting when people are submitting AI slop as their own work, and taking note to never hire these people. Beyond the general professional ethics, it makes you harder to distinguish from malicious parties and other incompetent people LARPing as having knowledge that they don't.

So fail to disclose at your own peril.

120. bootsmann ◴[22 Aug 25 08:32 UTC] No.44982165{5}[source]▶

>>44978524 #

Compilers don’t randomly fail to compile code that is too difficult for them to understand. Llvm makes sure that I never have to learn assembly, gpt doesn’t guarantee at all that I don’t have to learn to code.

121. Aeolun ◴[22 Aug 25 09:12 UTC] No.44982414{3}[source]▶

>>44978496 #

That's nonsense. It's like feeling you need to disclose that your IDE has autocomplete. Nobody discloses that, since it's ridiculous. You only disclose that you used Claude Code if you are not certain of the result (e.g. you think it is correct, but the maintainer might be a better judge).

If it's exactly the same as what you'd have written manually, and you are confident it works, then what's the point of disclosure?

replies(1): >>44987971 #

122. sho_hn ◴[22 Aug 25 10:06 UTC] No.44982704{12}[source]▶

>>44981579 #

Yep, true.

I've done things like share a ChatGPT account with a junior dev to steer them toward better prompts, actually, and that had some merit.

123. computerfriend ◴[22 Aug 25 10:35 UTC] No.44982869{3}[source]▶

>>44979194 #

You should add citations to books, stack overflow posts and colleagues you consult with, yes.

124. simoncion ◴[22 Aug 25 10:45 UTC] No.44982922{3}[source]▶

>>44979585 #

The quote you pulled suggests that if the work is majority machine-generated, then it loses copyright protection.

That is, it suggests that even if there are elements of human-generated content in a larger machine-generated work, the combined work as a whole is not eligible for copyright protection. Printed page iii of that PDF talks a bit more about that:

  * Copyright does not extend to purely AI-generated material, or material where there is insufficient human control over the expressive elements.
  * Whether human contributions to AI-generated outputs are sufficient to constitute authorship must be analyzed on a case-by-case basis.

125. otterley ◴[22 Aug 25 13:41 UTC] No.44984592{8}[source]▶

>>44980065 #

We’re talking about code here, not prose.

Would you like to take the Pepsi challenge? Happy to put random code snippets in front of you and see whether you can accurately determine whether it was written by a human or an LLM.

126. EarlKing ◴[22 Aug 25 13:45 UTC] No.44984632{3}[source]▶

>>44979135 #

Original expression, yes, however you should've kept reading:

"In the Office’s view, it is well-established that copyright can protect only material that is the product of human creativity. Most fundamentally, the term “author,” which is used in both the Constitution and the Copyright Act, excludes non-humans." "In the case of works containing AI-generated material, the Office will consider whether the AI contributions are the result of “mechanical reproduction” or instead of an author’s “own original mental conception, to which [the author] gave visible form.” 24 The answer will depend on the circumstances, particularly how the AI tool operates and how it was used to create the final work.25 This is necessarily a case-by-case inquiry." "If a work’s traditional elements of authorship were produced by a machine, the work lacks human authorship and the Office will not register it."

The office has been quite consistent that works containing both human-made and AI-made elements will be registerable only to the extent that they contain human-made elements.

127. EarlKing ◴[22 Aug 25 13:51 UTC] No.44984701{3}[source]▶

>>44979585 #

This is what you get for skimming. :D

Just to be sure that I wasn't misremembering, I went through part 2 of the report and back to the original memorandum[1] that was sent out before the full report issued. I've included a few choice quotes to illustrate my point:

"These are no longer hypothetical questions, as the Office is already receiving and examining applications for registration that claim copyright in AI-generated material. For example, in 2018 the Office received an application for a visual work that the applicant described as “autonomously created by a computer algorithm running on a machine.” 7 The application was denied because, based on the applicant’s representations in the application, the examiner found that the work contained no human authorship. After a series of administrative appeals, the Office’s Review Board issued a final determination affirming that the work could not be registered because it was made “without any creative contribution from a human actor.”"

"More recently, the Office reviewed a registration for a work containing human-authored elements combined with AI-generated images. In February 2023, the Office concluded that a graphic novel comprised of human-authored text combined with images generated by the AI service Midjourney constituted a copyrightable work, but that the individual images themselves could not be protected by copyright. "

"In the case of works containing AI-generated material, the Office will consider whether the AI contributions are the result of “mechanical reproduction” or instead of an author’s “own original mental conception, to which [the author] gave visible form.” The answer will depend on the circumstances, particularly how the AI tool operates and how it was used to create the final work. This is necessarily a case-by-case inquiry."

"If a work’s traditional elements of authorship were produced by a machine, the work lacks human authorship and the Office will not register it."[1], pgs 2-4

---

On the odd chance that somehow the Copyright Office had reversed itself I then went back to part 2 of the report:

"As the Office affirmed in the Guidance, copyright protection in the United States requires human authorship. This foundational principle is based on the Copyright Clause in the Constitution and the language of the Copyright Act as interpreted by the courts. The Copyright Clause grants Congress the authority to “secur[e] for limited times to authors . . . the exclusive right to their . . . writings.” As the Supreme Court has explained, “the author [of a copyrighted work] is . . . the person who translates an idea into a fixed, tangible expression entitled to copyright protection.”

"No court has recognized copyright in material created by non-humans, and those that have spoken on this issue have rejected the possibility. "

"In most cases, however, humans will be involved in the creation process, and the work will be copyrightable to the extent that their contributions qualify as authorship." -- [2], pgs 15-16

---

TL;DR If you make something with the assistance of AI, you still have to be personally involved and contribute more than just a prompt in order to receive copyright, and then you will receive protection only over such elements of originality and authorship that you are responsible for, not those elements which the AI is responsible for.

--- [1] https://copyright.gov/ai/ai_policy_guidance.pdf [2] https://www.copyright.gov/ai/Copyright-and-Artificial-Intell...

128. eschaton ◴[22 Aug 25 18:27 UTC] No.44987971{4}[source]▶

>>44982414 #

It’s completely different from an IDE’s autocomplete because autocomplete in an IDE is only helping you type identifiers that already exist in your codebase or in any SDKs you’re using.

An LLM is regurgitating things from outside that space, where you have no idea of the provenance of what it’s putting into your code.

It doesn’t just matter that the code you’re contributing to a project is correct, it matters quite a lot if it’s actually something you’re allowed to contribute.

- You can’t contribute code that your employer owns to a project if they don’t want you to. - You can’t contribute code under a license that the project doesn’t want you to use. - And you can’t contribute code written by someone else and claim it’s your intellectual property without some sort of contract in place to grant that.

If you use an LLM to generate code that you’re contributing, you have both of the latter two problems. And all of those apply *even if* the code you’re contributing is identical to what you’d have written by hand off the top of your head.

When you contribute to a project, you’re not just sending that project a set of bits, you’re making attestations about how those bits were created.

Why does this seem so difficult for some supposed tech professionals to understand? The entire industry is intellectual property, and this is basic “IP 101” stuff.

replies(1): >>44994040 #

129. sheepscreek ◴[23 Aug 25 01:01 UTC] No.44991912{3}[source]▶

>>44978727 #

Fair enough - I’m just the messenger though, observing the current trends and extrapolating from there. Let’s talk about “AGENTS.md” files quickly. We’re specifying what I consider “rules” in plain language. Even lint rules (instead of creating a lint config). Could be a matter of convenience, and if it gets us 80% of the way, why not?

I believe it won’t be long before we have exceptional “programmers” who have mastered the art of vibe coding. If that does become the de facto standard for 80% programming done, then it’s not a long stretch from there that we might skip programming languages altogether. I’m simply suggesting that if you’re not going to examine the code, perhaps someone will eliminate that additional layer or step altogether, and we might be pleasantly surprised by the final result.

130. Aeolun ◴[23 Aug 25 07:28 UTC] No.44994040{5}[source]▶

>>44987971 #

> Why does this seem so difficult for some supposed tech professionals to understand?

Maybe because 99% of people that complain about this complain about problems that never occur in 99% of the cases they cite. My employer isn’t going to give a shit that code that I’ve written for their internal CRUD app gets more or less directly copied into my own. There’s only one way to do that, it was already in my head before I wrote it for them, and it’ll still be in after. As long as I’m not directly competing with their interests, what the hell do they care.

> When you contribute to a project, you’re not just sending that project a set of bits, you’re making attestations about how those bits were created.

You are really not. You are only doing that if the project requires some attestation of provenance. I can tell you that none of mine do.

131. KritVutGu ◴[24 Aug 25 16:04 UTC] No.45005303{7}[source]▶

>>44978053 #

> the pride in work thing is just not high on the list of incentives

Thanks for putting it so well.

That is what hurts. A lot. Taking pride out of work, especially creative work, makes the world a worse place; it makes life less worth living.

> inventing pale shadows of things

Yes.

132. KritVutGu ◴[24 Aug 25 16:13 UTC] No.45005379{5}[source]▶

>>44977972 #

> Otherwise, what’s the harm in saying AI guides you to the solution if you can attest to it being a good solution?

For one: it threatens to make an entire generation of programmers lazy and stupid. They stop exercising their creative muscle. Writing and reviewing are different activities; both should be done continuously.

This is perfectly observable with a foreign language. If you stop actively using a foreign language after learning it really well, your ability to speak it fades pretty quickly, while your ability to understand it fades too, but less quickly.

133. KritVutGu ◴[24 Aug 25 16:22 UTC] No.45005475{7}[source]▶

>>44978442 #

> The existence of AI slop fundamentally breaks these assumptions. That is why we need enforced social norms around disclosure.

Exactly! The code used double as "proof of work". Well-formed language used to double as "proof of thinking". And that's what AI breaks: it speaks, but doesn't think. And my core point is that language that does not originate from well-reasoned human effort (i.e., from either writing the language directly, or from writing such code manually that generates the language deterministically, and for known reasons/intents), does not deserve human attention. Even if the "observable behavior" of such language (when executed as code) looks "alright".

And because I further think that no code should be accepted without human review (which excludes both not reviewing AI-generated code at all and having some other AI review the AI-generated code), I conclude that AI-generated code can never be accepted.

134. KritVutGu ◴[24 Aug 25 16:28 UTC] No.45005522{8}[source]▶

>>44978592 #

> Normalize draft PRs and sharing big messes of code you're not quite sure about but want to start a conversation about. Normalize admitting that you don't fully understand the code you've written / are tasked with reviewing and that this is perfectly fine and doesn't reflect poorly on you at all, in fact it reflects humility and a collaborative spirit.

Such behaviors can only be normalized in a classroom / ramp-up / mentorship-like setting. Which is very valid, BUT:

- Your reviewers are always overloaded, so they need some official mandate / approval to mentor newcomers. This is super important, and should be done everywhere.

- Even with the above in place: because you're being mentored with great attention to detail, you owe it to your reviewer not to drown them in AI slop. You must honor them by writing every single line that you ask them to spend their attention on yourself. Ultimately, their educative efforts are invested IN YOU, not (only) in the code that may finally be merged. I absolutely refuse to review or otherwise correct AI slop, while at the same time I'm 100% committed to transfer whatever knowledge I may have to another human.

Fuck AI.

135. KritVutGu ◴[24 Aug 25 18:35 UTC] No.45006528{4}[source]▶

>>44978432 #

It is precisely the same existential threat, to code and to software developers, in my eyes. I take the exact same pride in my code (which I want to be free software, BTW) as artists do in their art. Writing code is a form of self-expression and self-realization for me, and as such, it is completely personal, between myself, and those (humans) who read my code.

↑