Paper2Code: Automating Code Generation from Scientific Papers

(arxiv.org)

1. colkassad ◴[25 Apr 25 19:14 UTC] No.43797522[source]▶

>>43796419 (OP) #

It would be neat to run their pdf through their implementation[1] and compare results.

https://github.com/going-doer/Paper2Code

replies(3): >>43798706 #>>43800933 #>>43801410 #

2. bjourne ◴[25 Apr 25 21:27 UTC] No.43798697[source]▶

>>43796419 (OP) #

It relies on OpenAI's o3-mini model which (I think) you have to pay for.

3. endofreach ◴[25 Apr 25 21:29 UTC] No.43798706[source]▶

>>43797522 #

Damn, i was hoping the link was your result of that. Please do that. I can't start another project currently. But i'd love the short result as an anecdote. But if you don't do it, i might have to. Please let me know. Great idea, really.

replies(1): >>43798735 #

4. omneity ◴[25 Apr 25 21:32 UTC] No.43798735{3}[source]▶

>>43798706 #

If I was the paper author I would have done it and include the results as an appendix or a repo.

replies(1): >>43798803 #

5. JackYoustra ◴[25 Apr 25 21:40 UTC] No.43798803{4}[source]▶

>>43798735 #

haha would that itself be a product of the paper then?

replies(1): >>43798816 #

6. sitkack ◴[25 Apr 25 21:40 UTC] No.43798807[source]▶

>>43796419 (OP) #

I have had good results doing bidirectional programming in Tex <=> Python.

replies(1): >>43801445 #

7. omneity ◴[25 Apr 25 21:41 UTC] No.43798816{5}[source]▶

>>43798803 #

Maybe by doing it enough times o3-mini will end up reimplementing itself?

replies(2): >>43798979 #>>43808787 #

8. endofreach ◴[25 Apr 25 22:09 UTC] No.43798979{6}[source]▶

>>43798816 #

Imagine, this was actually the consequence— against all odds. The true power of AI is about to be discovered... through this silly experiment... and you are the one... all you gotta do— is do it. And imagine you don't do it, because you think it can't lead to such a serious result... and if we miss this great leap forward... go, throw your life away and do this. now. the universe is waiting on you, my friend.

replies(1): >>43803290 #

9. somethingsome ◴[25 Apr 25 23:11 UTC] No.43799345[source]▶

>>43796419 (OP) #

I like the idea of having automatic code creation from papers, but I’m scared of it.

Suppose you get a paper, you automatically implement the code, and then modify it a bit with a novel idea, and publish your paper. Then somebody else does that with your paper, and does the same.. at some point, we will have a huge quantity of vibe coded code on github, and two similar papers will have very different underlying implementations, so hard to reason about and hard to change.

From a learning perspective, you try to understand the code, and it's all spaghetti, and you loose more time understanding the code than it would take to just reimplement it. You also learn a lot by not only reading the paper but reading the authors code where most of the small details reside.

And I'm not even talking about the reliability of the code, test to know that it's the correct implementation. Authors try to make papers as close as possible to the implementation but sometimes subtle steps are removed, sometimes from inadvertance, sometimes because the number of pages is lionmited.

A paper and an implementation are not one-to-one mappings

replies(3): >>43799790 #>>43801693 #>>43801768 #

10. tomrod ◴[26 Apr 25 00:25 UTC] No.43799790[source]▶

>>43799345 #

> we will have a huge quantity of vibe coded code on github

That may actually be an improvement over much of the code that is generated for papers.

11. ks2048 ◴[26 Apr 25 02:26 UTC] No.43800389[source]▶

>>43796419 (OP) #

So who has a code2paper model that we can hook up in a loop?

12. protolyticmind ◴[26 Apr 25 04:25 UTC] No.43800933[source]▶

>>43797522 #

I thought it would be humorously ironic :D

13. brundolf ◴[26 Apr 25 05:33 UTC] No.43801186[source]▶

>>43796419 (OP) #

Not what OP is about, but idea I just had:

We should have the technology now to hand-write pseudocode on a piece of paper (or whiteboard or chalkboard), and have it translated and executed. Maybe you even hook up a projector, and project the output back onto the board

14. polygot ◴[26 Apr 25 06:34 UTC] No.43801410[source]▶

>>43797522 #

I decided to do that, and made Paper2Code2Code: https://github.com/alexyorke/Paper2Code2Code/tree/main

replies(1): >>43802581 #

15. somethingsome ◴[26 Apr 25 06:44 UTC] No.43801445[source]▶

>>43798807 #

Can you give more details? I'm curious

replies(1): >>43804516 #

16. Narew ◴[26 Apr 25 07:49 UTC] No.43801693[source]▶

>>43799345 #

Honestly, the code from my interns have greatly improve since they use AI. And there is lots of really ugly and hard to read code from papers. So I don't think it will be an obvious loss of readability to have code completely generated by AI :)

replies(1): >>43801873 #

17. exe34 ◴[26 Apr 25 08:05 UTC] No.43801768[source]▶

>>43799345 #

> you try to understand the code, and it's all spaghetti, and you loose more time understanding the code than it would take to just reimplement it.

I agree with you in general, but maybe the jump would be similar to the one from hand-written punchcards/assembly to higher level compilers. Very few people worry about the asm generated from GHC for example. So maybe a lot of code would be like that. I also imagine at some point a better intermediate language for LLMs to generate will be discovered and suddenly that's how most programs will be written.

replies(2): >>43801935 #>>43804215 #

18. somethingsome ◴[26 Apr 25 08:32 UTC] No.43801873{3}[source]▶

>>43801693 #

Very interesting, do you have a specific approach to educate them in how to use LLMs? Or they do it free wheel? Do you give advices? If so which kind?

I would love to have a structured approach to help students learn to use better LLMs. What I have observed (for uni students) is that they produce better code overall, but have no idea how it works, and would not be able to reproduce it without LLMs. (this is from first to last year)

replies(1): >>43802306 #

19. wzdd ◴[26 Apr 25 08:33 UTC] No.43801878[source]▶

>>43796419 (OP) #

I did this recently with a forward-mode AD paper, by just pasting the PDF into Claude. Like everyone, I've had mixed results with Claude coding, so I wouldn't bet my life on the output, but Claude was able to produce something for Pytorch that worked first go, had appropriate performance characteristics, and it was able to convincingly explain the connection between parts of the generated code and the paper. I was impressed.

20. somethingsome ◴[26 Apr 25 08:43 UTC] No.43801935{3}[source]▶

>>43801768 #

I would love that, I mostly work with ideas and the codes are implementation details for me, so yes, in some way, having automated code generation would allow me to be way more productive. I'm not against it, I'm just scared about the efficiency of the approach by an LLM (at the moment at least)

The example codes they give is 'implementing deep learning papers', I find those papers the easiest to implement compared to some obscure algorithm for example that can't rely on frameworks such as pytorch and where speed is critical.

I can't find the essay, but I think it was wolfram that wrote that we should let students use Mathematica and educate them in using it from a young age, the rationale behind is: before you had to use logarithmic tables, and it took much time during the education. Then, with the event of the calculator, students could instantaneously compute logarithms, so they could focus on more advanced ideas that use them. With Mathematica they could automatically execute matrix operations, so they would spend most of the time thinking about matrix operations instead of just learning how to manipulate a matrix by hand.

So with more powerful tools, you can expand the capabilities faster.

But the main difference I see here, is that maths are precise and well defined. Here you get a software which is a sample in the space of possible softwares that solve the problem (if you are lucky).

To get to the metaphorical point punchcards->GHC you need a LLM tool that give always the same answer, and hopefully, the optimal one, and with small changes in the paper, it moves the software in the space of viable softwares only a bit. Maybe we will get there, but this is not yet what this paper proposes

replies(1): >>43802136 #

21. UltraSane ◴[26 Apr 25 09:30 UTC] No.43802136{4}[source]▶

>>43801935 #

It would be very interesting to teach kids math using Mathmatica starting from kindergarten.

22. Narew ◴[26 Apr 25 10:08 UTC] No.43802306{4}[source]▶

>>43801873 #

For the moment it's a bit free wheel. And I agree the code is better but they could probably not reproduce it themself. I honestly don't know how to "force" them to understand the code the llm write if the code is clean. But this happen when the code produce by llm is over complicated or bad and we catch that by doing code review. I have the impression it will create even more disparity between student, students that just use llm code and the ones that try to understand it.

23. IshKebab ◴[26 Apr 25 11:08 UTC] No.43802581{3}[source]▶

>>43801410 #

And... does it work?

24. barotalomey ◴[26 Apr 25 13:10 UTC] No.43803290{7}[source]▶

>>43798979 #

That's how nothing ever works outside fiction.

25. dadoomer ◴[26 Apr 25 15:00 UTC] No.43804215{3}[source]▶

>>43801768 #

> the jump would be similar to the one from hand-written punchcards/assembly to higher level compilers

I wouldn't. Compilers are not stochastic text models and they can be verified and reasoned about to a great extent.

26. sitkack ◴[26 Apr 25 15:35 UTC] No.43804516{3}[source]▶

>>43801445 #

Give it a try, have it teach you tex for the summation notation, have it write the code, modify the code, have it translate back to tex. Repeat.

You can do a quick test to see which models have been trained on tex.

Keep a tex visualizer handy.

27. colkassad ◴[27 Apr 25 01:37 UTC] No.43808787{6}[source]▶

>>43798816 #

It's be something like this that creates the Singularity, just you see