Paper2Agent: Stanford Reimagining Research Papers as Interactive AI Agents

1. V__ ◴[22 Sep 25 22:23 UTC] No.45340359[source]▶

> Conventional research papers require readers to invest substantial effort to understand and adapt a paper's code, data, and methods to their own work [...]

But that's the point! If we take out the effort to understand, really understand something on a deeper level from even research, then how can there be anything useful build on top of it? Is everything going to loose any depth and become shallow?

replies(6): >>45340411 #>>45341919 #>>45342900 #>>45343183 #>>45343438 #>>45343442 #

2. ethin ◴[22 Sep 25 22:30 UTC] No.45340411[source]▶

>>45340359 (TP) #

Isn't this also a problem given that ChatGPT at least is bad a summarizing scientific papers[1]? Idk about Claude or Gemenai with that though. Still a problem.

Edit: spelling.

[1]: https://arstechnica.com/ai/2025/09/science-journalists-find-...

replies(1): >>45341382 #

3. andai ◴[23 Sep 25 00:23 UTC] No.45341382[source]▶

>>45340411 #

This study seemed to be before the reasoning models came out. With them I have the opposite problem. I ask something simple and it responds with what reads like a scientific paper.

replies(1): >>45347505 #

4. SecretDreams ◴[23 Sep 25 01:49 UTC] No.45341919[source]▶

>>45340359 (TP) #

I'm just imaging someone trying to defend their PhD or comprehensive and only having surface level knowledge of the shit their AI bot has cited for them.

Fyi - this is actually happening right meow. And most young profs are writing their grants using ai. The biggest issue with the latter? It's hard to tell the difference with how many grants are just rehashing the same stuff over and over.

5. aprilthird2021 ◴[23 Sep 25 04:39 UTC] No.45342900[source]▶

>>45340359 (TP) #

This is what's so depressing about the Apple Intelligence or Gemini ads for consumer AI. Everything they tell us an AI can do for us, like make up a bedtime story for our kids, or write a letter from a kid to his/her hero, or remember someone's name who you forgot from earlier, or sum up a presentation you forgot to read.

Isn't the point to put the time into those things? At some point aren't those the things one should choose to put time into?

replies(1): >>45344836 #

6. zaptheimpaler ◴[23 Sep 25 05:49 UTC] No.45343183[source]▶

>>45340359 (TP) #

Easy to say but have you ever read a paper and then a summary or breakdown of that paper by an actual person? Or compare a paper that you do understand very well with how you would explain it in a blog post.

The academic style of writing is almost purposefully as obtuse and dense and devoid of context as possible. Academia is trapped in all kinds of stupid norms.

7. IanCal ◴[23 Sep 25 06:23 UTC] No.45343438[source]▶

>>45340359 (TP) #

They’re not talking about removing any effort to understand a paper, but to lower it for the same level of understanding.

If more effort required to reach the same understanding was a good thing we should be making papers much harder to read than they currently are.

Why are the specific things they are doing a problem? Automatically building pipelines and code described in a paper, checking it matches the reported results then being able to execute it for queries the user has - is that a bad thing for understanding?

replies(1): >>45346009 #

8. eric-burel ◴[23 Sep 25 06:24 UTC] No.45343442[source]▶

>>45340359 (TP) #

Talk to engineers, they just fear research papers. It's important to have alternate ways of consuming research. Then maybe some engineers will jump the fence and start taking the habit of reading papers.

replies(3): >>45344746 #>>45346169 #>>45366212 #

9. ◴[23 Sep 25 09:32 UTC] No.45344746[source]▶

>>45343442 #

10. exe34 ◴[23 Sep 25 09:47 UTC] No.45344836[source]▶

>>45342900 #

if you're making up stories for your kid, you're not spending enough time consuming media that apple can profit from.

11. aleph_minus_one ◴[23 Sep 25 12:22 UTC] No.45346009[source]▶

>>45343438 #

> They’re not talking about removing any effort to understand a paper, but to lower it for the same level of understanding.

Much more damage is done if the understanding that you get is wrong.

12. viraptor ◴[23 Sep 25 12:37 UTC] No.45346169[source]▶

>>45343442 #

A lot of them are using obscure vocabulary and sciency notation to express very basic ideas. It's like some switch comes on "this is a PAPER it needs fancy words!"

I'd actually like a change from the other end. Instead of "make agents so good they can implement complex papers", how any "write paper so plainly that current agents can implement reproduction"?

replies(1): >>45347612 #

13. ijk ◴[23 Sep 25 14:25 UTC] No.45347505{3}[source]▶

>>45341382 #

Of course "reads like" is part of the problem. The models are very good at producing something that reads like the kind of document I asked for and not as good at guaranteeing that the document has the meaning I intended.

replies(1): >>45363544 #

14. randomfrogs ◴[23 Sep 25 14:33 UTC] No.45347612{3}[source]▶

>>45346169 #

Scientific vocabulary is designed to be precise. The reason papers are written the way they are is to try to convey ideas with as little chance of misinterpretation as possible. It is maddeningly difficult to do that - I can't tell you how many times I've gotten paper and grant reviews where I cannot fathom how Reviewer 2 (and it's ALWAYS Reviewer 2) managed to twist what I wrote into what they thought I wrote. Almost every time you see something that seems needlessly precise and finicky, it's probably in response to a reviewer's comment, and the secret subtext is "There - now it's so over specified even a rabid wildebeest, or YOU, dear reviewer, couldn't misundertand it!" Unfortunately, a side effect of that is that a lot of the writing ends up seeming needlessly dense.

15. andai ◴[24 Sep 25 17:38 UTC] No.45363544{4}[source]▶

>>45347505 #

That is true. What I meant was, I'll ask it for some practical problem I'm dealing with in my life, and it will start talking about how to model it in terms of a cybernetic system with inertia, springs and feedback loops.

Not a bad line of thinking, especially if you're microdosing, but I find myself turning off reasoning more frequently that I'd expected, considering it's supposed to be objectively better.

replies(1): >>45365345 #

16. ijk ◴[24 Sep 25 20:05 UTC] No.45365345{5}[source]▶

>>45363544 #

I find that for more "intuitive" evaluations, reasoning tends to hurt more than it helps. In other words, if it can do a one-shot classification correctly, adding a bunch of second guessing just degrades the performance.

This may change as our RL methods get better at properly rewarding correct partial traces and penalizing overthinking, but for the moment there's often a stark difference when a multi-step process improves the model's ability to reason through the context and when it doesn't.

This is made more complicated (for human prompters and evaluators) by the fact that (as Anthropic has demonstrated) the text of the reasoning trace means something very different for the model versus how a human is interpreting it. The reasoning the model claims it is doing can sometimes be worlds away from the actual calculations (e.g., how it uses helixal structures to do addition [1]).

[1] https://openreview.net/pdf?id=CqViN4dQJk

17. backflippinbozo ◴[24 Sep 25 21:31 UTC] No.45366212[source]▶

>>45343442 #

AI & ML engineering in particular is very research-adjacent.

That's why we began building agents to source ideas from the arXiv and implement the core-methods from the papers in YOUR target repo months before this publication.

We shared the demo video of it in our production system a while back: https://news.ycombinator.com/item?id=45132898

And we're offering a technical deep-dive into how we built it tomorrow at 9am PST with the AG2 team: https://calendar.app.google/3soCpuHupRr96UaF8

We've built up to 1K Docker images over the past couple months which we make public on DockerHub: https://hub.docker.com/u/remyxai

And we're close to an integration with arXiv that will have these pre-built images linked to the papers: https://github.com/arXiv/arxiv-browse/pull/908