←back to thread

728 points freetonik | 2 comments | | HN request time: 0.009s | source
Show context
neilv ◴[] No.44976959[source]
There is also IP taint when using "AI". We're just pretending that there's not.

If someone came to you and said "good news: I memorized the code of all the open source projects in this space, and can regurgitate it on command", you would be smart to ban them from working on code at your company.

But with "AI", we make up a bunch of rationalizations. ("I'm doing AI agentic generative AI workflow boilerplate 10x gettin it done AI did I say AI yet!")

And we pretend the person never said that they're just loosely laundering GPL and other code in a way that rightly would be existentially toxic to an IP-based company.

replies(6): >>44976975 #>>44977217 #>>44977317 #>>44980292 #>>44980599 #>>44980775 #
ineedasername ◴[] No.44977317[source]
Courts (at least in the US) have already ruled that use of ingested data for training is transformative. There’s lots of details to figure, but the genie is out of the bottle.

Sure it’s a big hill to climb in rethinking IP laws to align with a societal desire that generating IP continue to be a viable economic work product, but that is what’s necessary.

replies(9): >>44977525 #>>44978041 #>>44978412 #>>44978589 #>>44979766 #>>44979930 #>>44979934 #>>44980167 #>>44980236 #
bsder ◴[] No.44978412[source]
> Courts (at least in the US) have already ruled that use of ingested data for training is transformative.

If you have code that happens to be identical to some else's code or implements someone's proprietary algorithm, you're going to lose in court even if you claim an "AI" gave it to you.

AI is training on private Github repos and coughing them up. I've had it regurgitate a very well written piece of code to do a particular computational geometry algorithm. It presented perfect, idiomatic Python with perfect tests that caught all the degenerate cases. That was obviously proprietary code--no amount of searching came up with anything even remotely close (it's why I asked the AI, after all).

replies(5): >>44979018 #>>44979022 #>>44979146 #>>44979821 #>>44979900 #
rowanG077 ◴[] No.44979146[source]
That seems a real stretch. GPT 5 just invented new math for reference. What you are saying would be equivalent to saying that this math was obviously in some paper that mathematician did not know about. Maybe true, but it's a far reach.
replies(4): >>44979742 #>>44983482 #>>44984987 #>>44985260 #
jakelazaroff ◴[] No.44979742[source]
This would be the first time ever that an LLM has discovered new knowledge, but the far reach is that the information does appear in the training data?
replies(2): >>44979915 #>>44985283 #
rerdavies ◴[] No.44985283[source]
https://medium.com/@deshmukhpratik931/the-matrix-multiplicat...

And it's not an accident that significant percentage (40%?) of all papers being published in top journals involve application of AIs.

replies(1): >>44986752 #
jakelazaroff ◴[] No.44986752[source]
This article is about the same thing I mentioned in a sibling comment. I personally don't find an unreplicated Google white paper to be compelling evidence.
replies(1): >>44989064 #
1. rerdavies ◴[] No.44989064[source]
It's a fast matrix multiply! (A decades-old human problem). What exactly do you need to replicate??! Just count the multiplies, fer goodness sake.
replies(1): >>44991283 #
2. jakelazaroff ◴[] No.44991283[source]
> What exactly do you need to replicate??!

The AI coming up with it? When Google claimed their Wizard of Oz show at the Las Vegas Sphere was AI-generated, a ton of VFX artists spoke up to say they'd spent months of human labor working on it. Forgive me for not giving the benefit of the doubt to a company that has a vested interest in making their AI seem more powerful, and a track record of lying to do so.