←back to thread

469 points ghuntley | 1 comments | | HN request time: 0.215s | source
Show context
ofirpress ◴[] No.45001234[source]
We (the Princeton SWE-bench team) built an agent in ~100 lines of code that does pretty well on SWE-bench, you might enjoy it too: https://github.com/SWE-agent/mini-swe-agent
replies(7): >>45001287 #>>45001548 #>>45001716 #>>45001737 #>>45002061 #>>45002110 #>>45009789 #
BenderV ◴[] No.45002110[source]
Nice but sad to see lack of tools. Most your code is about the agent framework instead of specific to SWE.

I've built a SWE agent too (for fun), check it out => https://github.com/myriade-ai/autocode

replies(1): >>45002134 #
diminish ◴[] No.45002134[source]
> sad to see lack of tools.

Lack of tools in mini-swe-agent is a feature. You can run it with any LLM no matter how big or small.

replies(1): >>45002821 #
BenderV ◴[] No.45002821[source]
I'm trying to understand what does it got to do with LLM size? Imho, right tools allow small models to perform better than undirected tool like bash to do everything. But I understand that this code is to show people how function calling is just a template for LLM.
replies(1): >>45003155 #
diminish ◴[] No.45003155[source]
Mini swe agent, as an academic tool, can be easily tested aimed to show the power of a simple idea against any LLM. You can go and test it with different LLMs. Tool calls didn't work fine with smaller LLM sizes usually. I don't see many viable alternatives less than 7GB, beyond Qwen3 4B for tool calling.

> right tools allow small models to perform better than undirected tool like bash to do everything.

Interesting enough the newer mini swe agent was refutation of this hypothesis for very large LLMs from the original swe agent paper (https://arxiv.org/pdf/2405.15793) assuming that specialized tools work better.

replies(1): >>45011950 #
1. BenderV ◴[] No.45011950[source]
Thanks for your answer.

I guess that it's only a matter of finetuning.

LLM have lots of experience with bash so I get they figure out how to work with it. They don't have experience with custom tools you provide it.

And also, LLM "tools" as we know it need better design (to show states, dynamic actions).

Given both, AI with the right tools will outperform AI with generic and uncontrolled tool.