Paper2Code: Automating Code Generation from Scientific Papers

(arxiv.org)

133 points Jerry2 | 2 comments | 25 Apr 25 17:36 UTC | HN request time: 1.007s | source

Show context

somethingsome ◴[25 Apr 25 23:11 UTC] No.43799345[source]▶

I like the idea of having automatic code creation from papers, but I’m scared of it.

Suppose you get a paper, you automatically implement the code, and then modify it a bit with a novel idea, and publish your paper. Then somebody else does that with your paper, and does the same.. at some point, we will have a huge quantity of vibe coded code on github, and two similar papers will have very different underlying implementations, so hard to reason about and hard to change.

From a learning perspective, you try to understand the code, and it's all spaghetti, and you loose more time understanding the code than it would take to just reimplement it. You also learn a lot by not only reading the paper but reading the authors code where most of the small details reside.

And I'm not even talking about the reliability of the code, test to know that it's the correct implementation. Authors try to make papers as close as possible to the implementation but sometimes subtle steps are removed, sometimes from inadvertance, sometimes because the number of pages is lionmited.

A paper and an implementation are not one-to-one mappings

replies(3): >>43799790 #>>43801693 #>>43801768 #

Narew ◴[26 Apr 25 07:49 UTC] No.43801693[source]▶

>>43799345 #

Honestly, the code from my interns have greatly improve since they use AI. And there is lots of really ugly and hard to read code from papers. So I don't think it will be an obvious loss of readability to have code completely generated by AI :)

replies(1): >>43801873 #

1. somethingsome ◴[26 Apr 25 08:32 UTC] No.43801873[source]▶

>>43801693 #

Very interesting, do you have a specific approach to educate them in how to use LLMs? Or they do it free wheel? Do you give advices? If so which kind?

I would love to have a structured approach to help students learn to use better LLMs. What I have observed (for uni students) is that they produce better code overall, but have no idea how it works, and would not be able to reproduce it without LLMs. (this is from first to last year)

replies(1): >>43802306 #

2. Narew ◴[26 Apr 25 10:08 UTC] No.43802306[source]▶

>>43801873 (TP) #

For the moment it's a bit free wheel. And I agree the code is better but they could probably not reproduce it themself. I honestly don't know how to "force" them to understand the code the llm write if the code is clean. But this happen when the code produce by llm is over complicated or bad and we catch that by doing code review. I have the impression it will create even more disparity between student, students that just use llm code and the ones that try to understand it.

↑