(github.com)

1. BurningFrog ◴[26 Jun 25 00:51 UTC] No.44383219[source]▶

Would it make sense to include the complete prompt that generated the code with the code?

replies(3): >>44383344 #>>44384164 #>>44386462 #

2. astrobiased ◴[26 Jun 25 01:13 UTC] No.44383344[source]▶

It would need to be more than that. A prompt for one model can have different results vs another. Even when the model has different treatment for inference, eg quantization, the same prompt for the unquantized and quantized model could differ.

replies(1): >>44383438 #

3. verdverm ◴[26 Jun 25 01:31 UTC] No.44383438[source]▶

>>44383344 #

Even more so, when you come back to understand in a few years, the model will no longer be available

replies(1): >>44383820 #

4. galangalalgol ◴[26 Jun 25 02:51 UTC] No.44383820{3}[source]▶

>>44383438 #

One of several reasons to use an open model even if it isn't quite as good. Version control the models and commit the prompts with the model name and a hash of the parameters. I'm not really sure what value that reproducibility adds though.

5. catlifeonmars ◴[26 Jun 25 04:24 UTC] No.44384164[source]▶

>>44383219 (TP) #

You’d need to hash the model weights and save the seeds for the temperature prng as well, in order to verify the provenance. Ideally it would be reproducible, right?

replies(1): >>44385459 #

6. danielbln ◴[26 Jun 25 08:45 UTC] No.44385459[source]▶

>>44384164 #

Maybe 2 years ago. Nowadays LLMs call functions and use tools, good luck capturing that in a way that it's reproducible.

7. ethan_smith ◴[26 Jun 25 11:48 UTC] No.44386462[source]▶

>>44383219 (TP) #

Including prompts would create transparency but still wouldn't resolve the underlying copyright uncertainty of the output or guarantee the code wasn't trained on incompatibly-licensed material.

↑

Define policy forbidding use of AI code generators