←back to thread

399 points nomdep | 2 comments | | HN request time: 0.615s | source
Show context
tptacek ◴[] No.44295712[source]
I'm fine with anybody saying AI agents don't work for their work-style and am not looking to rebut this piece, but I'm going to take this opportunity to call something out.

The author writes "reviewing code is actually harder than most people think. It takes me at least the same amount of time to review code not written by me than it would take me to write the code myself". That sounds within an SD of true for me, too, and I had a full-time job close-reading code (for security vulnerabilities) for many years.

But it's important to know that when you're dealing with AI-generated code for simple, tedious, or rote tasks --- what they're currently best at --- you're not on the hook for reading the code that carefully, or at least, not on the same hook. Hold on before you jump on me.

Modern Linux kernels allow almost-arbitrary code to be injected at runtime, via eBPF (which is just a C program compiled to an imaginary virtual RISC). The kernel can mostly reliably keep these programs from crashing the kernel. The reason for that isn't that we've solved the halting problem; it's that eBPF doesn't allow most programs at all --- for instance, it must be easily statically determined that any backwards branch in the program runs for a finite and small number of iterations. eBPF isn't even good at determining that condition holds; it just knows a bunch of patterns in the CFG that it's sure about and rejects anything that doesn't fit.

That's how you should be reviewing agent-generated code, at least at first; not like a human security auditor, but like the eBPF verifier. If I so much as need to blink when reviewing agent output, I just kill the PR.

If you want to tell me that every kind of code you've ever had to review is equally tricky to review, I'll stipulate to that. But that's not true for me. It is in fact very easy to me to look at a rote recitation of an idiomatic Go function and say "yep, that's what that's supposed to be".

replies(7): >>44295745 #>>44295773 #>>44295785 #>>44295795 #>>44296065 #>>44296839 #>>44296921 #
112233 ◴[] No.44295795[source]
This is radical and healthy way to do it. Obviously wrong — reject. Obviously right — accept. In any other case — also reject, as non-obvious.

I guess it is far removed from the advertized use case. Also, I feel one would be better off having auto-complete powered by LLM in this case.

replies(3): >>44295800 #>>44295849 #>>44296573 #
1. bluefirebrand ◴[] No.44295849[source]
> Obviously right — accept.

I don't think code is ever "obviously right" unless it is trivially simple

replies(1): >>44301863 #
2. saulpw ◴[] No.44301863[source]
Seriously. I've taken to thinking of most submitters as adversarial agents--even the ones I know to be well-meaning humans. I've seen enough code that looks obviously right and yet has some subtle bug (that I then have to tease apart and fix), or worse, a security flaw that lies in wait like a sleeper cell for the right moment to unleash havoc and ruin your day.

So with this "obviously right" rubric I would wind up rejecting 95% of submissions, which is a waste of my time and energy. How about instead I just write it myself? At least then I know who's responsible for cleaning up after the it.