If you are good at code review, you will be good at using AI agents

(www.seangoedecke.com)

192 points imasl42 | 3 comments | 20 Sep 25 04:59 UTC | HN request time: 0.001s | source

Show context

its-kostya ◴[20 Sep 25 09:35 UTC] No.45311805[source]▶

Code review is part of the job, but one of the least enjoyable parts. Developers like _writing_ and that gives the most job satisfaction. AI tools are helpful, but inherently increases the amount of code we have to review with more scrutiny than my colleagues because of how unpredictable - yet convincing - it can be. Why did we create tools that do the fun part and increase the non-fun part? Where are the "code-review" agents at?

replies(9): >>45311852 #>>45311876 #>>45311926 #>>45312027 #>>45312147 #>>45312307 #>>45312348 #>>45312499 #>>45362757 #

simonw ◴[20 Sep 25 09:57 UTC] No.45311926[source]▶

>>45311805 #

> Where are the "code-review" agents at?

OpenAI's Codex Cloud just added a new feature for code review, and their new GPT-5-Codex model has been specifically trained for code review: https://openai.com/index/introducing-upgrades-to-codex/

Gemini and Claude both have code review features that work via GitHub Actions: https://developers.google.com/gemini-code-assist/docs/review... and https://docs.claude.com/en/docs/claude-code/github-actions

GitHub have their own version of this pattern too: https://github.blog/changelog/2025-04-04-copilot-code-review...

There are also a whole lot of dedicated code review startups like https://coderabbit.ai/ and https://www.greptile.com/ and https://www.qodo.ai/products/qodo-merge/

replies(1): >>45311984 #

vrighter ◴[20 Sep 25 10:10 UTC] No.45311984[source]▶

>>45311926 #

you can't use a system with the exact same hallucination problem to check the work of another one just like it. Snake oil

replies(4): >>45312016 #>>45312370 #>>45313235 #>>45319240 #

1. ben_w ◴[20 Sep 25 13:32 UTC] No.45313235[source]▶

>>45311984 #

Weirdly, you can not only do this, it somehow does actually catch some of its own mistakes.

Not all of the mistakes, they generally still have a performance ceiling less than human experts (though even this disclaimer is still simplifying), but this kind of self-critique is basically what makes the early "reasoning" models one up over simple chat models: for the first-n :END: tokens, replace with "wait" and see it attempt other solutions and pick something usually better.

replies(1): >>45315068 #

2. vrighter ◴[20 Sep 25 16:59 UTC] No.45315068[source]▶

>>45313235 (TP) #

the "pick something usually better" sounds a lot like "and then draw the rest of the f*** owl"

replies(1): >>45315737 #

3. ben_w ◴[20 Sep 25 18:06 UTC] No.45315737[source]▶

>>45315068 #

Turned out that for a lot of things (not all things, Transformers have a lot of weaknesses), using a neural network to score an output is, if not "fine", then at least "ok".

Generating 10 options with mediocre mean and some standard deviation, and then evaluating which is best, is much easier than deliberative reasoning to just get one thing right in the first place more often.

↑