Paper2Agent: Stanford Reimagining Research Papers as Interactive AI Agents

(arxiv.org)

152 points Gaishan | 1 comments | 22 Sep 25 22:02 UTC | HN request time: 0.256s | source

Show context

simonw ◴[23 Sep 25 01:32 UTC] No.45341827[source]▶

I went looking for how they define "agent" in the paper:

> AI agents are autonomous systems that can reason about tasks and act to achieve goals by leveraging external tools and resources [4]. Modern AI agents are typically powered by large language models (LLMs) connected to external tools or APIs. They can perform reasoning, invoke specialized models, and adapt based on feedback [5]. Agents differ from static models in that they are interactive and adaptive. Rather than returning fixed outputs, they can take multi-step actions, integrate context, and support iterative human–AI collaboration. Importantly, because agents are built on top of LLMs, users can interact with agents through human language, substantially reducing usage barriers for scientists.

So more-or-less an LLM running tools in a loop. I'm guessing "invoke specialized models" is achieved here by running a tool call against some other model.

replies(3): >>45342175 #>>45343430 #>>45366145 #

datadrivenangel ◴[23 Sep 25 02:28 UTC] No.45342175[source]▶

>>45341827 #

With your definitions of agents as running tools in a loop, do you have high hopes for multi-tool agents being feasible from a security perspective? Seems like they'll need to be locked down

replies(3): >>45342815 #>>45343410 #>>45366112 #

1. backflippinbozo ◴[24 Sep 25 21:21 UTC] No.45366112[source]▶

>>45342175 #

No doubt, this toy demo will break your system if the research repo code runs unsecured code.

We thought about this out as we built a system that goes beyond running the quickstart to implement the core-methods of arXiv papers as draft PRs for YOUR target repo.

Running quickstart in sandbox is practically useless.

To limit the attack surface we added PR#1929 to AG2 so we could pass API keys to the DockerCommandLineCodeExecutor and use egress whitelisting to limit the ability of an agent to reach out to a compromised server: https://github.com/ag2ai/ag2/pull/1929

Been talking publicly about this for at least a month before this publication, and along the way we've built up nearly 1K Docker images for arXiv paper code: https://hub.docker.com/u/remyxai

We're close to seeing these images linked to the arXiv papers after PR#908 is merged: https://github.com/arXiv/arxiv-browse/pull/908

And we're actually doing a technical deep-dive with the AG2 team on our work tomorrow at 9am PST: https://calendar.app.google/3soCpuHupRr96UaF8

↑