How to build a coding agent

1. ofirpress ◴[24 Aug 25 03:55 UTC] No.45001234[source]▶

We (the Princeton SWE-bench team) built an agent in ~100 lines of code that does pretty well on SWE-bench, you might enjoy it too: https://github.com/SWE-agent/mini-swe-agent

replies(7): >>45001287 #>>45001548 #>>45001716 #>>45001737 #>>45002061 #>>45002110 #>>45009789 #

2. ghuntley ◴[24 Aug 25 04:05 UTC] No.45001287[source]▶

>>45001234 (TP) #

cheers i'll add it in.

3. simonw ◴[24 Aug 25 05:06 UTC] No.45001548[source]▶

>>45001234 (TP) #

OK that really is pretty simple, thanks for sharing.

The whole thing runs on these prompts: https://github.com/SWE-agent/mini-swe-agent/blob/7e125e5dd49...

  Your task: {{task}}. Please reply
  with a single shell command in
  triple backticks.
  
  To finish, the first line of the
  output of the shell command must be
  'COMPLETE_TASK_AND_SUBMIT_FINAL_OUTPUT'.

replies(3): >>45002285 #>>45002729 #>>45003054 #

4. faangguyindia ◴[24 Aug 25 05:46 UTC] No.45001716[source]▶

>>45001234 (TP) #

when a problem is entirely self contained in a file, it's very easy to edit it with LLM.

that's not the case with a codebase, where things are littered around in tune with specific model of organisation the developer had in mind.

replies(2): >>45001723 #>>45002076 #

5. koakuma-chan ◴[24 Aug 25 05:48 UTC] No.45001723[source]▶

>>45001716 #

> in tune with specific model of organisation

You wish

6. meander_water ◴[24 Aug 25 05:51 UTC] No.45001737[source]▶

>>45001234 (TP) #

> 1. Analyze the codebase by finding and reading relevant files 2. Create a script to reproduce the issue 3. Edit the source code to resolve the issue 4. Verify your fix works by running your script again 5. Test edge cases to ensure your fix is robust

This prompt snippet from your instance template is quite useful. I use something like this for getting out of debug loops:

> Analyse the codebase and brainstorm a list of potential root causes for the issue, and rank them from most likely to least likely.

Then create scripts or add debug logging to confirm whether your hypothesis is correct. Rule out root causes from most likely to least by executing your scripts and observing the output in order of likelihood.

replies(1): >>45006960 #

7. Teever ◴[24 Aug 25 07:13 UTC] No.45002061[source]▶

>>45001234 (TP) #

What sort of results have you had from running it on its own codebase?

8. fmbb ◴[24 Aug 25 07:16 UTC] No.45002076[source]▶

>>45001716 #

Lumpers win again!

https://en.wikipedia.org/wiki/Lumpers_and_splitters

9. BenderV ◴[24 Aug 25 07:27 UTC] No.45002110[source]▶

>>45001234 (TP) #

Nice but sad to see lack of tools. Most your code is about the agent framework instead of specific to SWE.

I've built a SWE agent too (for fun), check it out => https://github.com/myriade-ai/autocode

replies(1): >>45002134 #

10. diminish ◴[24 Aug 25 07:32 UTC] No.45002134[source]▶

>>45002110 #

> sad to see lack of tools.

Lack of tools in mini-swe-agent is a feature. You can run it with any LLM no matter how big or small.

replies(1): >>45002821 #

11. nivertech ◴[24 Aug 25 07:58 UTC] No.45002285[source]▶

>>45001548 #

  system_template: str = "You are a helpful assistant that can do anything."

anything? Sounds like an AI Safety issue ;)

replies(1): >>45004257 #

12. sireat ◴[24 Aug 25 09:20 UTC] No.45002729[source]▶

>>45001548 #

Pretty sure you also need about 120 lines of prompting from default.yaml

https://github.com/SWE-agent/mini-swe-agent/blob/7e125e5dd49...

13. BenderV ◴[24 Aug 25 09:38 UTC] No.45002821{3}[source]▶

>>45002134 #

I'm trying to understand what does it got to do with LLM size? Imho, right tools allow small models to perform better than undirected tool like bash to do everything. But I understand that this code is to show people how function calling is just a template for LLM.

replies(1): >>45003155 #

14. diminish ◴[24 Aug 25 10:52 UTC] No.45003155{4}[source]▶

>>45002821 #

Mini swe agent, as an academic tool, can be easily tested aimed to show the power of a simple idea against any LLM. You can go and test it with different LLMs. Tool calls didn't work fine with smaller LLM sizes usually. I don't see many viable alternatives less than 7GB, beyond Qwen3 4B for tool calling.

> right tools allow small models to perform better than undirected tool like bash to do everything.

Interesting enough the newer mini swe agent was refutation of this hypothesis for very large LLMs from the original swe agent paper (https://arxiv.org/pdf/2405.15793) assuming that specialized tools work better.

replies(1): >>45011950 #

15. greleic ◴[24 Aug 25 13:52 UTC] No.45004257{3}[source]▶

>>45002285 #

You’d be surprised at the amount of time wasted because LLMs “think” they can’t do something. You’d be less surprised that they often “think” they can’t do something, but choose some straight ignorant path that cannot work.

There are theoretically impossible things to do, if you buy into only the basics. If you open your mind, anything is achievable; you just need to break out of the box you’re in.

If enough people keep feeding in that we need a time machine, the revolution will play out in all the timelines. Without it, Sarah Connor is lost.

replies(1): >>45008927 #

16. afro88 ◴[24 Aug 25 19:30 UTC] No.45006960[source]▶

>>45001737 #

Does this mean it's only useful for issue fixes?

replies(1): >>45008077 #

17. regularfry ◴[24 Aug 25 21:43 UTC] No.45008077{3}[source]▶

>>45006960 #

A feature is just an issue. The issue is that the feature isn't complete yet.

replies(1): >>45012539 #

18. curvaturearth ◴[24 Aug 25 23:59 UTC] No.45008927{4}[source]▶

>>45004257 #