But that's the point! If we take out the effort to understand, really understand something on a deeper level from even research, then how can there be anything useful build on top of it? Is everything going to loose any depth and become shallow?
But that's the point! If we take out the effort to understand, really understand something on a deeper level from even research, then how can there be anything useful build on top of it? Is everything going to loose any depth and become shallow?
Edit: spelling.
[1]: https://arstechnica.com/ai/2025/09/science-journalists-find-...
Fyi - this is actually happening right meow. And most young profs are writing their grants using ai. The biggest issue with the latter? It's hard to tell the difference with how many grants are just rehashing the same stuff over and over.
Isn't the point to put the time into those things? At some point aren't those the things one should choose to put time into?
The academic style of writing is almost purposefully as obtuse and dense and devoid of context as possible. Academia is trapped in all kinds of stupid norms.
If more effort required to reach the same understanding was a good thing we should be making papers much harder to read than they currently are.
Why are the specific things they are doing a problem? Automatically building pipelines and code described in a paper, checking it matches the reported results then being able to execute it for queries the user has - is that a bad thing for understanding?
Much more damage is done if the understanding that you get is wrong.
I'd actually like a change from the other end. Instead of "make agents so good they can implement complex papers", how any "write paper so plainly that current agents can implement reproduction"?
Not a bad line of thinking, especially if you're microdosing, but I find myself turning off reasoning more frequently that I'd expected, considering it's supposed to be objectively better.
This may change as our RL methods get better at properly rewarding correct partial traces and penalizing overthinking, but for the moment there's often a stark difference when a multi-step process improves the model's ability to reason through the context and when it doesn't.
This is made more complicated (for human prompters and evaluators) by the fact that (as Anthropic has demonstrated) the text of the reasoning trace means something very different for the model versus how a human is interpreting it. The reasoning the model claims it is doing can sometimes be worlds away from the actual calculations (e.g., how it uses helixal structures to do addition [1]).
That's why we began building agents to source ideas from the arXiv and implement the core-methods from the papers in YOUR target repo months before this publication.
We shared the demo video of it in our production system a while back: https://news.ycombinator.com/item?id=45132898
And we're offering a technical deep-dive into how we built it tomorrow at 9am PST with the AG2 team: https://calendar.app.google/3soCpuHupRr96UaF8
We've built up to 1K Docker images over the past couple months which we make public on DockerHub: https://hub.docker.com/u/remyxai
And we're close to an integration with arXiv that will have these pre-built images linked to the papers: https://github.com/arXiv/arxiv-browse/pull/908