(ghuntley.com)

469 points ghuntley | 1 comments | 24 Aug 25 03:21 UTC | HN request time: 0.228s | source

Show context

faangguyindia ◴[24 Aug 25 04:33 UTC] No.45001426[source]▶

>>45001051 (OP) #

Anyone can build a coding agent which works on a) fresh code base b) when you've unlimited token budget

now build it for old codebase, let's see how precisely it edits or removes features without breaking the whole codebase

lets see how many tokens it consumes per bug fix or feature addition.

replies(4): >>45001529 #>>45001567 #>>45001784 #>>45001830 #

1. NitpickLawyer ◴[24 Aug 25 06:18 UTC] No.45001830[source]▶

>>45001426 #

There's "swe re-bench", a benchmark that tracks model release dates, and you can see how the model did for "real-world" bugs that got submitted on github after the model was released. (obviously works best for open models).

There are a few models that solve 30-50% of (new) tasks pulled from real-wolrd repos. So ... yeah.

↑

How to build a coding agent