←back to thread

Using LLMs at Oxide

(rfd.shared.oxide.computer)
694 points steveklabnik | 9 comments | | HN request time: 0.001s | source | bottom
Show context
csb6 ◴[] No.46179547[source]
Strange to see no mention of potential copyright violations found in LLM-generated code (e.g. LLMs reproducing code from Github verbatim without respecting the license). I would think that would be a pretty important consideration for any software development company, especially one that produces so much free software.
replies(4): >>46179678 #>>46179797 #>>46179941 #>>46188231 #
dboreham ◴[] No.46179678[source]
Is there current generation LLMs do this? I suppose I mean "do this any more than human developers do".
replies(1): >>46180015 #
1. theresistor ◴[] No.46180015[source]
A very recent example: https://github.com/ocaml/ocaml/pull/14369
replies(2): >>46180102 #>>46180413 #
2. phyzome ◴[] No.46180102[source]
...what a remarkable thread.
replies(1): >>46180509 #
3. yard2010 ◴[] No.46180413[source]
>> Here's my question: why did the files that you submitted name Mark Shinwell as the author?

> Beats me. AI decided to do so and I didn't question it. I did ask AI to look at the OxCaml implementation in the beginning.

This shows that the problem with AI is philosophical, not practical

4. menaerus ◴[] No.46180509[source]
Right? If this is really true, that some random folk without compiler engineering experience, implemented a completely new feature in ocaml compiler by prompting the LLM to produce the code for him, then I think it really is remarkable.
replies(2): >>46181474 #>>46182364 #
5. ccortes ◴[] No.46181474{3}[source]
Oh wow, is that what you got from this?

It seems more like a non experienced guy asked the LLM to implement something and the LLM just output what and experienced guy did before, and it even gave him the credit

replies(2): >>46181940 #>>46182770 #
6. rcxdude ◴[] No.46181940{4}[source]
Copyright notices and signatures in generative AI output are generally a result of the expectation created by the training data that such things exist, and are generally unrelated to how much the output corresponds to any particular piece of training data, and especially to who exactly produced that work.

(It is, of course, exceptionally lazy to leave such things in if you are using the LLM to assist you with a task, and can cause problems of false attribution. Especially in this case where it seems to have just picked a name of one of the maintainers of the project)

7. kfajdsl ◴[] No.46182364{3}[source]
It’s one thing for you (yes, you, the user using the tool) to generate code you don’t understand for a side project or one off tool. It’s another thing to expect your code to be upstreamed into a large project and let others take on the maintenance burden, not to mention review code you haven’t even reviewed yourself!

Note: I, myself, am guilty of forking projects, adding some simple feature I need with an LLM quickly because I don’t want to take the time to understand the codebase, and using it personally. I don’t attempt to upstream changes like this and waste maintainers’ time until I actually take the time myself to understand the project, the issue, and the solution.

replies(1): >>46182814 #
8. menaerus ◴[] No.46182770{4}[source]
Did you take a look at the code? Given your response I figure you did not because if you did you would see that the code was _not_ cloned but genuinely compiled by the LLM.
9. menaerus ◴[] No.46182814{4}[source]
What are you talking about? It was ridiculously useful debugging feature that nobody in their sanity would block because "added maintenance". MR was rejected purely because of political/social reasons.