The whole thing runs on these prompts: https://github.com/SWE-agent/mini-swe-agent/blob/7e125e5dd49...
Your task: {{task}}. Please reply
with a single shell command in
triple backticks.
To finish, the first line of the
output of the shell command must be
'COMPLETE_TASK_AND_SUBMIT_FINAL_OUTPUT'.
that's not the case with a codebase, where things are littered around in tune with specific model of organisation the developer had in mind.
You wish
This prompt snippet from your instance template is quite useful. I use something like this for getting out of debug loops:
> Analyse the codebase and brainstorm a list of potential root causes for the issue, and rank them from most likely to least likely.
Then create scripts or add debug logging to confirm whether your hypothesis is correct. Rule out root causes from most likely to least by executing your scripts and observing the output in order of likelihood.
I've built a SWE agent too (for fun), check it out => https://github.com/myriade-ai/autocode
https://github.com/SWE-agent/mini-swe-agent/blob/7e125e5dd49...
> right tools allow small models to perform better than undirected tool like bash to do everything.
Interesting enough the newer mini swe agent was refutation of this hypothesis for very large LLMs from the original swe agent paper (https://arxiv.org/pdf/2405.15793) assuming that specialized tools work better.
There are theoretically impossible things to do, if you buy into only the basics. If you open your mind, anything is achievable; you just need to break out of the box you’re in.
If enough people keep feeding in that we need a time machine, the revolution will play out in all the timelines. Without it, Sarah Connor is lost.
I guess that it's only a matter of finetuning.
LLM have lots of experience with bash so I get they figure out how to work with it. They don't have experience with custom tools you provide it.
And also, LLM "tools" as we know it need better design (to show states, dynamic actions).
Given both, AI with the right tools will outperform AI with generic and uncontrolled tool.