←back to thread

469 points ghuntley | 6 comments | | HN request time: 0s | source | bottom
Show context
ofirpress ◴[] No.45001234[source]
We (the Princeton SWE-bench team) built an agent in ~100 lines of code that does pretty well on SWE-bench, you might enjoy it too: https://github.com/SWE-agent/mini-swe-agent
replies(7): >>45001287 #>>45001548 #>>45001716 #>>45001737 #>>45002061 #>>45002110 #>>45009789 #
1. meander_water ◴[] No.45001737[source]
> 1. Analyze the codebase by finding and reading relevant files 2. Create a script to reproduce the issue 3. Edit the source code to resolve the issue 4. Verify your fix works by running your script again 5. Test edge cases to ensure your fix is robust

This prompt snippet from your instance template is quite useful. I use something like this for getting out of debug loops:

> Analyse the codebase and brainstorm a list of potential root causes for the issue, and rank them from most likely to least likely.

Then create scripts or add debug logging to confirm whether your hypothesis is correct. Rule out root causes from most likely to least by executing your scripts and observing the output in order of likelihood.

replies(1): >>45006960 #
2. afro88 ◴[] No.45006960[source]
Does this mean it's only useful for issue fixes?
replies(1): >>45008077 #
3. regularfry ◴[] No.45008077[source]
A feature is just an issue. The issue is that the feature isn't complete yet.
replies(1): >>45012539 #
4. afro88 ◴[] No.45012539{3}[source]
> 2. Create a script to reproduce the issue

Surely that would send it a bit off the rails to implement a feature?

replies(1): >>45014461 #
5. regularfry ◴[] No.45014461{4}[source]
Sounds like an acceptance test to me!
replies(1): >>45018294 #
6. afro88 ◴[] No.45018294{5}[source]
True. I guess I should actually try it out :)