Most active commenters

    ←back to thread

    371 points ulrischa | 14 comments | | HN request time: 1.573s | source | bottom
    Show context
    layer8 ◴[] No.43235766[source]
    > Just because code looks good and runs without errors doesn’t mean it’s actually doing the right thing. No amount of meticulous code review—or even comprehensive automated tests—will demonstrably prove that code actually does the right thing. You have to run it yourself!

    I would have stated this a bit differently: No amount of running or testing can prove the code correct. You actually have to reason through it. Running/testing is merely a sanity/spot check of your reasoning.

    replies(4): >>43235828 #>>43235856 #>>43236195 #>>43236756 #
    1. dmos62 ◴[] No.43235828[source]
    Well, what if you run a complete test suite?
    replies(5): >>43236019 #>>43236118 #>>43236949 #>>43237000 #>>43237496 #
    2. layer8 ◴[] No.43236019[source]
    There is no complete test suite, unless your code is purely functional and has a small-ish finite input domain.
    replies(2): >>43236080 #>>43236539 #
    3. suzzer99 ◴[] No.43236080[source]
    And even then, your code could pass all tests but be a spaghetti mess that will be impossible to maintain and add features to.
    4. e12e ◴[] No.43236118[source]
    You mean, for example test that your sieve finds all primes, and only primes that fit in 4096 bits?
    5. MattSayar ◴[] No.43236539[source]
    Seems to be a bit of a catch 22. No LLM can write perfect code, and no test suite can catch all bugs. Obviously, no human can write perfect code either.

    If LLM-generated code has been "reasoned-through," tested, and it does the job, I think that's a net-benefit compared to human-only generated code.

    replies(1): >>43236792 #
    6. unclebucknasty ◴[] No.43236792{3}[source]
    >I think that's a net-benefit compared to human-only generated code.

    Net-benefit in what terms though? More productive WRT raw code output? Lower error rate?

    Because, something about the idea of generating tons of code via LLMs, which humans have to then verify, seems less productive to me and more error-prone.

    I mean, when verifying code that you didn't write, you generally have to fully reason through it, just as you would to write it (if you really want to verify it). But, reasoning through someone else's code requires an extra step to latch on to the author's line of reasoning.

    OTOH, if you just breeze through it because it looks correct, you're likely to miss errors.

    The latter reminds me of the whole "Full self-driving, but keep your hands on the steering wheel, just in case" setup. It's going to lull you into overconfidence and passivity.

    replies(2): >>43236832 #>>43238760 #
    7. jmb99 ◴[] No.43236832{4}[source]
    > reasoning through someone else's code requires an extra step to latch on to the author's line of reasoning.

    And, in my experience, it’s a lot easier to latch on to a real person’s real line of reasoning rather than a chatbot’s “line of reasoning”

    replies(2): >>43237380 #>>43239750 #
    8. ◴[] No.43236949[source]
    9. shakna ◴[] No.43237000[source]
    If the complete test suite were enough, then SQLite, who famously has one of the largest and most comprehensive, would not encounter bugs. However, they still do.

    If you employ AI, you're adding a remarkable amount of speed, to a processing domain that is undecidable because most inputs are not finite. Eventually, you will end up reconsidering the Gambler's Fallacy, because of the chances of things going wrong.

    10. unclebucknasty ◴[] No.43237380{5}[source]
    Exactly. And, if correction is required, then you either re-write it or you're stuck maintaining whatever odd way the LLM approached the problem, whether it's as optimal (or readable) as a human's or not.
    11. bandrami ◴[] No.43237496[source]
    Paging Dr. Turing. Dr. Turing, please report to the HN comment section.
    replies(1): >>43257383 #
    12. rapind ◴[] No.43238760{4}[source]
    > "Full self-driving, but keep your hands on the steering wheel, just in case" setup

    This is actually a trick though. No one working on self driving actually expects people to actually babysit it for long at all. Babysitting actually feels worse than driving. I just saw a video on self-driving trucks and how the human driver had his hands hovering on the wheel. The goal of the video is to make you think about how amazing self-driving rigs will be, but all I could think about was what an absolutely horrible job it will be to babysit these things.

    Working full-time on AI code reviews sounds even worse. Maybe if it's more of a conversation and you're collaboratively iterating on small chunks of code then it wouldn't be so bad. In reality though, we'll just end up trusting the AI because it'll save us a ton of money and we'll find a way to externalize the screw ups.

    13. Ekaros ◴[] No.43239750{5}[source]
    Also after reasonable period if you are stuck you can actually ask them what were they thinking and why was it written that way and what are the constrains they thought of.

    And you can discuss these, with both of you hopefully having experience in the domain.

    14. dmos62 ◴[] No.43257383[source]
    Gave me a chuckle!