I would have stated this a bit differently: No amount of running or testing can prove the code correct. You actually have to reason through it. Running/testing is merely a sanity/spot check of your reasoning.
I would have stated this a bit differently: No amount of running or testing can prove the code correct. You actually have to reason through it. Running/testing is merely a sanity/spot check of your reasoning.
If LLM-generated code has been "reasoned-through," tested, and it does the job, I think that's a net-benefit compared to human-only generated code.
Net-benefit in what terms though? More productive WRT raw code output? Lower error rate?
Because, something about the idea of generating tons of code via LLMs, which humans have to then verify, seems less productive to me and more error-prone.
I mean, when verifying code that you didn't write, you generally have to fully reason through it, just as you would to write it (if you really want to verify it). But, reasoning through someone else's code requires an extra step to latch on to the author's line of reasoning.
OTOH, if you just breeze through it because it looks correct, you're likely to miss errors.
The latter reminds me of the whole "Full self-driving, but keep your hands on the steering wheel, just in case" setup. It's going to lull you into overconfidence and passivity.
And, in my experience, it’s a lot easier to latch on to a real person’s real line of reasoning rather than a chatbot’s “line of reasoning”
And you can discuss these, with both of you hopefully having experience in the domain.