←back to thread

549 points thecr0w | 1 comments | | HN request time: 0s | source
Show context
thuttinger ◴[] No.46184466[source]
Claude/LLMs in general are still pretty bad at the intricate details of layouts and visual things. There are a lot of problems that are easy to get right for a junior web dev but impossible for an LLM. On the other hand, I was able to write a C program that added gamma color profile support to linux compositors that don't support it (in my case Hyprland) within a few minutes! A - for me - seemingly hard task, which would have taken me at least a day or more if I didn't let Claude write the code. With one prompt Claude generated C code that compiled on first try that:

- Read an .icc file from disk

- parsed the file and extracted the VCGT (video card gamma table)

- wrote the VCGT to the video card for a specified display via amdgpu driver APIs

The only thing I had to fix was the ICC parsing, where it would parse header strings in the wrong byte-order (they are big-endian).

replies(3): >>46184840 #>>46185379 #>>46185476 #
jacquesm ◴[] No.46185379[source]
Claude didn't write that code. Someone else did and Claude took that code without credit to the original author(s), adapted it to your use case and then presented it as its own creation to you and you accepted this. If a human did this we probably would have a word for them.
replies(16): >>46185404 #>>46185408 #>>46185442 #>>46185473 #>>46185478 #>>46185791 #>>46185885 #>>46185911 #>>46186086 #>>46186326 #>>46186420 #>>46186759 #>>46187004 #>>46187058 #>>46187235 #>>46188771 #
mlinsey ◴[] No.46185791[source]
Certainly if a human wrote code that solved this problem, and a second human copied and tweaked it slightly for their use case, we would have a word for them.

Would we use the same word if two different humans wrote code that solved two different problems, but one part of each problem was somewhat analogous to a different aspect of a third human's problem, and the third human took inspiration from those parts of both solutions to create code that solved a third problem?

What if it were ten different humans writing ten different-but-related pieces of code, and an eleventh human piecing them together? What if it were 1,000 different humans?

I think "plagiarism", "inspiration", and just "learning from" fall on some continuous spectrum. There are clear differences when you zoom out, but they are in degree, and it's hard to set a hard boundary. The key is just to make sure we have laws and norms that provide sufficient incentive for new ideas to continue to be created.

replies(6): >>46186125 #>>46186199 #>>46187063 #>>46188272 #>>46189797 #>>46194087 #
nextos ◴[] No.46186199{3}[source]
In case of LLMs, due to RAG, very often it's not just learning but almost direct real-time plagiarism from concrete sources.
replies(2): >>46186877 #>>46186903 #
sholain ◴[] No.46186877{4}[source]
RAG and LLMs are not the same thing, but 'Agents' incorporate both.

Maybe we could resolve the bit of a conundrum by the op in requiring 'agents' to give credit for things if they did rag them or pull them off the web?

It still doesn't resolve the 'inherent learning' problem.

It's reasonable to suggest that if 'one person did it, we should give credit' - at least in some cases, and also reasonable that if 1K people have done similar things ad the AI learns from that, well, I don't think credit is something that should apply.

But a couple of considerations:

- It may not be that common for an LLM to 'see one thing one time' and then have such an accurate assessment of the solution. It helps, but LLMs tend not to 'learn' things that way.

- Some people might consider this the OSS dream - any code that's public is public and it's in the public domain. We don't need to 'give credit' to someone because they solved something relatively arbitrary - or - if they are concerned with that, then we can have a separate mechanism for that, aka they can put it on Github or Wikipedia even, and then we can worry about 'who thought of it first' as a separate consideration. But in terms of Engineering application, that would be a bit of a detractor.

replies(1): >>46187380 #
martin-t ◴[] No.46187380{5}[source]
> if 1K people have done similar things ad the AI learns from that, well, I don't think credit is something that should apply.

I think it should.

Sure, if you make a small amount of money and divide it among the 1000 people who deserve credit due to their work being used to create ("train") the model, it might be too small to bother.

But if actual AGI is achieved, then it has nearly infinite value. If said AGI is built on top of the work of the 1000 people, then almost infinity divided by 1000 is still a lot of money.

Of course, the real numbers are way larger, LLMs were trained on the work of at least 100M but perhaps over a billion of people. But the value they provide over a long enough timespan is also claimed to be astronomical (evidenced by the valuations of those companies). It's not just their employees who deserve a cut but everyone whose work was used to train them.

> Some people might consider this the OSS dream

I see the opposite. Code that was public but protected by copyleft can now be reused in private/proprietary software. All you need to do it push it through enough matmuls and some nonlinearities.

replies(1): >>46191218 #
sholain ◴[] No.46191218{6}[source]
- I don't think it's even reasonable to suggest that 1000 people all coming up with variations of some arbitrary bit of code either deserve credit - or certainly 'financial remuneration' because they wrote some arbitrary piece of code.

That scenario is already today very well accepted legally and morally etc as public domain.

- Copyleft is not OSS, it's a tiny variation of it, which is both highly ideological and impractical. Less than 2% of OSS projects are copyleft. It's a legit perspective obviously, but it hasn't bee representative for 20 years.

Whatever we do with AI, we already have a basic understanding of public domain, at least we can start from there.

replies(1): >>46213108 #
martin-t ◴[] No.46213108{7}[source]
> I don't think it's even reasonable to suggest that 1000 people all coming up with variations of some arbitrary bit of code either deserve credit

There's 8B people on the planet, probably ~100M can code to some degree[0]. Something only 1k people write is actually pretty rare.

Where would you draw the line? How many out of how many?

If I take a leaked bit of Google or MS or, god forbid, Oracle code and manage to find a variation of each small block in a few other projects, does it mean I can legally take the leaked code and use it for free?

Do you even realize to what lengths the tech companies went just a few years ago to protect their IP? People who ever even glanced at leaked code were prohibited from working on open source reimplementations.

> That scenario is already today very well accepted legally and morally etc as public domain.

1) Public domain is a legal concept, it has 0 relevance to morality.

2) Can you explain how you think this works? Can a person's work just automatically become public domain somehow by being too common?

> Copyleft is not OSS, it's a tiny variation of it, which is both highly ideological and impractical.

This sentence seems highly ideological. Linux is GPL, in fact, probably most SW on my non-work computer is GPL. It is very practical and works much better than commercial alternatives for me.

> Less than 2% of OSS projects are copyleft.

Where did you get this number? Using search engines, I get 20-30%.

[0]: It's the number of github users, though there's reportedly only ~25M professional SW devs, many more people can code but don't professionaly.

replies(1): >>46214987 #
1. sholain ◴[] No.46214987{8}[source]
+ Once again: 1000 K people coming up with some arbitrary bit of content is already understood in basically every legal regime in the world as 'public domain'.

"Can you explain how you think this works? Can a person's work just automatically become public domain somehow by being too common?"

Please ask ChatGPT for the breakdown but start with this: if someone writes something and does not copyright it, it's already in the 'public domain' and what the other 999 people do does not matter. Moreover, a lot of things are not copyrightable in the first place.

FYI I've worked at Fortune 50 Tech Companies, with 'Legal' and I know how sensitive they are - this is not a concern for them.

It's not a concern for anyone.

'One Person' reproduction -> now that is definitely a concern. That's what this is all about.

+ For OSS I think 20% number may come from those that are explicitly licensed. Out of 'all repos' it's a very tiny amount, of those that have specific licensing details it's closer to 20%. You can verify this yourself just by cruising repos. The breakdown could be different for popular projects, but in the context of AI and IP rights we're more concerned about 'small entities' being overstepped as the more institutional entities may have recourse and protections.

I think the way this will play out is if LLMs are producing material that could be considered infringing, then they'll get sued. If they don't - they won't.

And that's it.

It's why they don't release the training data - it's fully of stuff that is in legal grey area.