←back to thread

490 points todsacerdoti | 3 comments | | HN request time: 0.629s | source
Show context
JonChesterfield ◴[] No.44382974[source]
Interesting. Harder line than the LLVM one found at https://llvm.org/docs/DeveloperPolicy.html#ai-generated-cont...

I'm very old man shouting at clouds about this stuff. I don't want to review code the author doesn't understand and I don't want to merge code neither of us understand.

replies(8): >>44383040 #>>44383128 #>>44383155 #>>44383230 #>>44383315 #>>44383409 #>>44383434 #>>44384226 #
1. linsomniac ◴[] No.44384226[source]
>I don't want to review code the author doesn't understand

I get that. But the AI tooling when guided by a competent human can generate some pretty competent code, a lot of it can be driven entirely through natural language instructions. And every few months, the tooling is getting significantly more capable.

I'm contemplating what exactly it means to "understand" the code though. In the case of one project I'm working on, it's an (almost) entirely vibe-coded new storage backend to an existing VM orchestration system. I don't know the existing code base. I don't really have the time to have implemented it by hand (or I would have done it a couple years ago).

But, I've set up a test cluster and am running a variety of testing scenarios on the new storage backend. So I understand it from a high level design, and from the testing of it.

As an open source maintainer myself, I can imagine (thankfully I haven't been hit with it myself) how frustrating getting all sorts of low quality LLM "slop" submissions could be. I also understand that I'm going to have to review the code coming in whether or not the author of the submission understands it.

So how, as developers, do we leverage these tools as appropriate, and signal to other developers the level of quality in code. As someone who spent months tracking down subtle bugs in early Linux ZFS ports, I deeply understand that significant testing can trump human authorship and review of every line of code. ;-)

replies(1): >>44385741 #
2. imiric ◴[] No.44385741[source]
> I'm contemplating what exactly it means to "understand" the code though.

You can't seriously be questioning the meaning of "understand"... That's straight from Jordan B. Peterson's debate playbook which does nothing but devolve the conversation into absurdism, while making the person sound smart.

> I've set up a test cluster and am running a variety of testing scenarios on the new storage backend. So I understand it from a high level design, and from the testing of it.

You understand the system as well as any user could. Your tests only prove that the system works in specific scenarios, which may very well satisfy your requirements, but they absolutely do not prove that you understand how the system works internally, nor that the system is implemented with a reliable degree of accuracy, let alone that it's not misbehaving in subtle ways or that it doesn't have security issues that will only become apparent when exposed to the public. All of this might be acceptable for a tool that you built quickly which is only used by yourself or a few others, but it's far from acceptable for any type of production system.

> As someone who spent months tracking down subtle bugs in early Linux ZFS ports, I deeply understand that significant testing can trump human authorship and review of every line of code.

This doesn't match my (~20y) experience at all. Testing is important, particularly more advanced forms like fuzzing, but it's not a failproof method of surfacing bugs. Tests, like any code, can itself have bugs, it can test the wrong things, setup or mock the environment in ways not representative of real world usage, and most importantly, can only cover a limited amount of real world scenarios. Even in teams that take testing seriously, achieving 100% coverage, even for just statements, is seen as counterproductive and as a fool's errand. Deeply thorough testing as seen in projects like SQLite is practically unheard of. Most programmers I've worked with will often only write happy path tests, if they bother writing any at all.

Which isn't to say that code review is the solution. But a human reviewing the code, building a mental model of how it works and how it's not supposed to work, can often catch issues before the code is even deployed. It is at this point that writing a test is valuable, so that that specific scenario is cemented in the checks for the software, and regressions can be avoided.

So I wouldn't say that testing "trumps" reviews, but that it's not a reliable way of detecting bugs, and that both methods should ideally be used together.

replies(1): >>44390941 #
3. linsomniac ◴[] No.44390941[source]
You're right, "trumps" isn't the right word there. But, as you say, testing is an often neglected part of the process. There are absolutely issues that code review is going to be better at finding, particular security related ones. But, try fixing a subtle bug without a reproducible test case...