Most active commenters

jakelazaroff(4)
rerdavies(3)

Popular/hot comments

>>44979146 #
>>44979915 #

←back to thread

AI tooling must be disclosed for contributions

(github.com)

Show context

neilv ◴[21 Aug 25 19:26 UTC] No.44976959[source]▶

>>44976568 (OP) #

There is also IP taint when using "AI". We're just pretending that there's not.

If someone came to you and said "good news: I memorized the code of all the open source projects in this space, and can regurgitate it on command", you would be smart to ban them from working on code at your company.

But with "AI", we make up a bunch of rationalizations. ("I'm doing AI agentic generative AI workflow boilerplate 10x gettin it done AI did I say AI yet!")

And we pretend the person never said that they're just loosely laundering GPL and other code in a way that rightly would be existentially toxic to an IP-based company.

replies(6): >>44976975 #>>44977217 #>>44977317 #>>44980292 #>>44980599 #>>44980775 #

ineedasername ◴[21 Aug 25 19:57 UTC] No.44977317[source]▶

>>44976959 #

Courts (at least in the US) have already ruled that use of ingested data for training is transformative. There’s lots of details to figure, but the genie is out of the bottle.

Sure it’s a big hill to climb in rethinking IP laws to align with a societal desire that generating IP continue to be a viable economic work product, but that is what’s necessary.

replies(9): >>44977525 #>>44978041 #>>44978412 #>>44978589 #>>44979766 #>>44979930 #>>44979934 #>>44980167 #>>44980236 #

1. bsder ◴[21 Aug 25 21:39 UTC] No.44978412[source]▶

>>44977317 #

> Courts (at least in the US) have already ruled that use of ingested data for training is transformative.

If you have code that happens to be identical to some else's code or implements someone's proprietary algorithm, you're going to lose in court even if you claim an "AI" gave it to you.

AI is training on private Github repos and coughing them up. I've had it regurgitate a very well written piece of code to do a particular computational geometry algorithm. It presented perfect, idiomatic Python with perfect tests that caught all the degenerate cases. That was obviously proprietary code--no amount of searching came up with anything even remotely close (it's why I asked the AI, after all).

replies(5): >>44979018 #>>44979022 #>>44979146 #>>44979821 #>>44979900 #

2. ineedasername ◴[21 Aug 25 22:43 UTC] No.44979018[source]▶

>>44978412 (TP) #

>If you have code that happens to be identical to some else's code or implements someone's proprietary algorithm, you're going to lose in court even if you claim an "AI" gave it to you.

Not for a dozen lines here or there, even if it could be found and identified in a massive code base. That’s like quoting a paragraph of a book in another book, non infringing.

For the second half of your comment it sounds like you’re saying you got results that were too good to be AI- that’s a bit “no true Scotsman”, at least without more detail. But implementing an algorithm, even a complex one, is very much something an LLM can do. Algorithms are much better defined and scoped natural language, and LLMs do a reasonable job of translating to languages. An algorithm is a narrow subset of that task type with better defined context and syntax.

replies(2): >>44979801 #>>44980120 #

3. Filligree ◴[21 Aug 25 22:43 UTC] No.44979022[source]▶

>>44978412 (TP) #

How is that obviously proprietary? Aren't you implicitly assuming that the AI couldn't have written it on its own?

replies(2): >>44980688 #>>44983160 #

4. rowanG077 ◴[21 Aug 25 22:56 UTC] No.44979146[source]▶

>>44978412 (TP) #

That seems a real stretch. GPT 5 just invented new math for reference. What you are saying would be equivalent to saying that this math was obviously in some paper that mathematician did not know about. Maybe true, but it's a far reach.

replies(4): >>44979742 #>>44983482 #>>44984987 #>>44985260 #

5. jakelazaroff ◴[22 Aug 25 00:15 UTC] No.44979742[source]▶

>>44979146 #

This would be the first time ever that an LLM has discovered new knowledge, but the far reach is that the information does appear in the training data?

replies(2): >>44979915 #>>44985283 #

6. ozfive ◴[22 Aug 25 00:24 UTC] No.44979801[source]▶

>>44979018 #

What will happen when company A implements algorithm X based on AI output, company B does the same and company A claims that it is proprietary code and takes company B to court?

replies(2): >>44979970 #>>44980026 #

7. fzzzy ◴[22 Aug 25 00:28 UTC] No.44979821[source]▶

>>44978412 (TP) #

This makes no sense. Computational geometry algorithms are computable.

8. ants_everywhere ◴[22 Aug 25 00:41 UTC] No.44979900[source]▶

>>44978412 (TP) #

LLMs aren't good at rote memorization. They can't even get quotations of humans right.

It's easier for the LLM to rewrite an idiomatic computational geometry algorithm from scratch in a language it understands well like Python. Entire computational geometry textbooks and research papers are in its knowledge base. It doesn't have to copy some proprietary implementation.

replies(1): >>44979978 #

9. ants_everywhere ◴[22 Aug 25 00:43 UTC] No.44979915{3}[source]▶

>>44979742 #

They've been doing it for a while. Gemini has also discovered new math and new algorithms.

There is an entire research field of scientific discovery using LLMs together with sub-disciplines for the various specialization. LLMs routinely discover new things.

replies(3): >>44980007 #>>44981387 #>>44985250 #

10. andreasmetsala ◴[22 Aug 25 00:52 UTC] No.44979970{3}[source]▶

>>44979801 #

What has happened when the same thing happens without AI involved?

replies(1): >>44981006 #

11. gugagore ◴[22 Aug 25 00:53 UTC] No.44979978[source]▶

>>44979900 #

A search for "LLM Harry Potter" would suggest that LLMs are widely understood to be proficient at rote memorization.

(I find the example of the computational geometry algorithm being a clear case of direct memorization not very compelling, in any case.)

12. jakelazaroff ◴[22 Aug 25 00:59 UTC] No.44980007{4}[source]▶

>>44979915 #

I hadn't heard of that, so I did some searching and the single source for the claim I can find is a Google white paper. That doesn't automatically mean it's false, of course, but it is curious that the only people ostensibly showing LLMs discover new things are the companies offering the LLMs.

13. ◴[22 Aug 25 01:02 UTC] No.44980026{3}[source]▶

>>44979801 #

14. neilv ◴[22 Aug 25 01:18 UTC] No.44980120[source]▶

>>44979018 #

> Not for a dozen lines here or there, even if it could be found and identified in a massive code base. That’s like quoting a paragraph of a book in another book, non infringing.

It's potentially non-infringing in a book if you quote it in a plausible way, and properly.

If you copy&paste a paragraph from another book into yours, it's infringing, and a career-ending scandal. There's plenty of precedent on that.

Just like if you manually copied a function out of some GPL code and pasted it into your own.

Or if you had an LLM do it for you.

15. inferiorhuman ◴[22 Aug 25 03:15 UTC] No.44980688[source]▶

>>44979022 #

The idea that something that can't handle simple algorithms (e.g. counting the number of times a letter occurs in a word) could magically churn out far more advanced algorithms complete with tests is… well it's a bit of a stretch.

replies(1): >>44985117 #

16. ozfive ◴[22 Aug 25 04:32 UTC] No.44981006{4}[source]▶

>>44979970 #

Yep, it’s not a brand-new problem. I just wonder if AI is going to turbocharge the odds of these disputes popping up.

17. tovej ◴[22 Aug 25 05:47 UTC] No.44981387{4}[source]▶

>>44979915 #

Citation needed, and I call bullshit. Unless you mean that they hallucinate useless algorithms that do not work, which they do.

LLMs do not have an internal model for manipulating mathematical objects. They cannot, by design, come up with new algorithms unless they are very nearly the same as some other algorithm. I'm a computer science researcher and have not heard of a single algorithm created by LLM.

18. martin-t ◴[22 Aug 25 11:23 UTC] No.44983160[source]▶

>>44979022 #

It cannot do anything on its own, it's just a (very complex, probabilistic) mechanical transformation (including interpolation) of training data and a prompt.

Advertising autocomplete as AI was a genius move because people start humanizing it and look for human-centric patterns.

Thinking A"I" can do anything on its own is like seeing faces in rocks on Mars.

19. postexitus ◴[22 Aug 25 12:04 UTC] No.44983482[source]▶

>>44979146 #

It invented "new math" as much as I invented "new food" when I was cooking yesterday. It did a series of quite complicated calculations that would take a well trained human several hours or even days to do - still impressive, but no it's not new maths.

20. const_cast ◴[22 Aug 25 14:16 UTC] No.44984987[source]▶

>>44979146 #

New math? As in it just fucking Isaac Newton'd invented calculus? Or do you just mean it solved a math problem?

21. Filligree ◴[22 Aug 25 14:28 UTC] No.44985117{3}[source]▶

>>44980688 #

It's terrible at executing algorithms. This, it turns out, is completely disjoint from writing algorithms.

22. ◴[22 Aug 25 14:40 UTC] No.44985250{4}[source]▶

>>44979915 #

23. rerdavies ◴[22 Aug 25 14:41 UTC] No.44985260[source]▶

>>44979146 #

An example: https://medium.com/@deshmukhpratik931/the-matrix-multiplicat...

Obviously not ChatGPT. But ChatGPT isn't the sharpest stick on the block by a significant margin. It is a mistake to judge what AIs can do based on what ChatGPT does.

24. rerdavies ◴[22 Aug 25 14:44 UTC] No.44985283{3}[source]▶

>>44979742 #

https://medium.com/@deshmukhpratik931/the-matrix-multiplicat...

And it's not an accident that significant percentage (40%?) of all papers being published in top journals involve application of AIs.

replies(1): >>44986752 #

25. jakelazaroff ◴[22 Aug 25 16:49 UTC] No.44986752{4}[source]▶

>>44985283 #

This article is about the same thing I mentioned in a sibling comment. I personally don't find an unreplicated Google white paper to be compelling evidence.

replies(1): >>44989064 #

26. rerdavies ◴[22 Aug 25 19:54 UTC] No.44989064{5}[source]▶

>>44986752 #

It's a fast matrix multiply! (A decades-old human problem). What exactly do you need to replicate??! Just count the multiplies, fer goodness sake.

replies(1): >>44991283 #

27. jakelazaroff ◴[22 Aug 25 23:39 UTC] No.44991283{6}[source]▶

>>44989064 #

> What exactly do you need to replicate??!

The AI coming up with it? When Google claimed their Wizard of Oz show at the Las Vegas Sphere was AI-generated, a ton of VFX artists spoke up to say they'd spent months of human labor working on it. Forgive me for not giving the benefit of the doubt to a company that has a vested interest in making their AI seem more powerful, and a track record of lying to do so.

↑