The author makes the excellent point that LLM-coding still has a human bottleneck at the code review point - regardless of whether the issue at hand is fixed or not.
Leaving aside the fact that this isn't an LLM problem; we've always had tech debt due to cowboy devs and weak management or "commercial imperatives":
I'd be interested to know if any of the existing LLM ELO style leaderboards mark for code quality in addition to issue fixing?
The former seems a particularly useful benchmark as they become more powerful in surface abilities.
replies(1):