Most active commenters
  • noisy_boy(6)
  • fuzztester(5)

←back to thread

371 points ulrischa | 60 comments | | HN request time: 1.36s | source | bottom
1. notepad0x90 ◴[] No.43236385[source]
My fear is that LLM generated code will look great to me, I won't understand it fully but it will work. But since I didn't author it, I wouldn't be great at finding bugs in it or logical flaws. Especially if you consider coding as piecing together things instead of implementing a well designed plan. Lots of pieces making up the whole picture but a lot of those pieces are now put there by an algorithm making educated guesses.

Perhaps I'm just not that great of a coder, but I do have lots of code where if someone took a look it, it might look crazy but it really is the best solution I could find. I'm concerned LLMs won't do that, they won't take risks a human would or understand the implications of a block of code beyond its application in that specific context.

Other times, I feel like I'm pretty good at figuring out things and struggling in a time-efficient manner before arriving at a solution. LLM generated code is neat but I still have to spend similar amounts of time, except now I'm doing more QA and clean up work instead of debugging and figuring out new solutions, which isn't fun at all.

replies(13): >>43236847 #>>43237043 #>>43237101 #>>43237162 #>>43237387 #>>43237808 #>>43237956 #>>43238722 #>>43238763 #>>43238978 #>>43239372 #>>43239665 #>>43241112 #
2. unclebucknasty ◴[] No.43236847[source]
All of this. Could have saved me a comment [0] if I'd seen this earlier.

When people talk about 30% or 50% coding productivity gains with LLMs, I really want to know exactly what they're measuring.

[0] https://news.ycombinator.com/item?id=43236792

3. noisy_boy ◴[] No.43237043[source]
I do these things for this:

- keep the outline in my head: I don't give up the architect's seat. I decide which module does what and how it fits in the whole system, it's contract with other modules etc.

- review the code: this can be construed as negating the point of LLMs as this is time consuming but I think it is important to go through line by line and understand every line. You will absorb some of the LLM generated code in the process which will form an imperfect map in your head. That's essential for beginning troubleshooting next time things go wrong.

- last mile connectivity: several times the LLM takes you there but can't complete the last mile connectivity; instead of wasting time chasing it, do the final wiring yourself. This is a great shortcut to achieve the previous point.

replies(3): >>43238729 #>>43238883 #>>43240520 #
4. tokioyoyo ◴[] No.43237101[source]
The big argument against it is, at some point, there’s a chance, that you won’t really need to understand what the code does. LLMs writes code, LLMs write tests, you find bugs, LLM fixes code, LLM adds test cases for the found bug. Rinse and repeat.
replies(2): >>43237342 #>>43240215 #
5. fuzztester ◴[] No.43237162[source]
>My fear is that LLM generated code will look great to me, I won't understand it fully but it will work.

puzzled. if you don't understand it fully, how can you say that it will look great to you, and that it will work?

replies(4): >>43237362 #>>43238789 #>>43241331 #>>43247810 #
6. SamPatt ◴[] No.43237342[source]
For fairly simple projects built from scratch, we're already there.

Claude Code has been doing all of this for me on my latest project. It's remarkable.

It seems inevitable it'll get there for larger and more complex code bases, but who knows how far away that is.

7. Nevermark ◴[] No.43237362[source]
> if you don't understand it fully, how can you say that it will look great to you, and that it will work?

Presumably, that simply reflects that a primary developer always has an advantage of having a more reliable understanding of a large code base - and the insights into the problem that come about during development challenges - than a reviewer of such code.

A lot of important bug subtle insights, many sub-verbal, into a problem come from going through the large and small challenges of creating something that solves it. Reviewers just don't get those insights as reliably.

Reviewers can't see all the subtle or non-obvious alternate paths or choices. They are less likely to independently identify subtle traps.

8. hakaneskici ◴[] No.43237387[source]
When it comes to relying on code that you didn't write yourself, like an npm package, do you care if it's AI code or human code? Do you think your trust toward AI code may change over time?
replies(2): >>43237605 #>>43237702 #
9. PessimalDecimal ◴[] No.43237605[source]
Publicly available code with lots of prior usage seems less likely to be buggy than LLM-generated code produced on-demand and for use only by me.
10. sfink ◴[] No.43237702[source]
Of course I care. Human-written code was written for a purpose, with a set of constraints in mind, and other related code will have been written for the same or a complementary purpose and set of constraints. There is intention in the code. It is predictable in a certain way, and divergences from the expected are either because I don't fully understand something about the context or requirements, or because there's a damn good reason. It is worthwhile to dig further until I do understand, since it will very probably have repercussions elsewhere and elsewhen.

For AI code, that's a waste of time. The generated code will be based on an arbitrary patchwork of purposes and constraints, glued together well enough to function. I'm not saying it lacks purpose or constraints, it's just that those are inherited from random sources. The parts flow together with robotic but not human concern for consistency. It may incorporate brilliant solutions, but trying to infer intent or style or design philosophy is about as useful as doing handwriting analysis on a ransom note made from pasted-together newspaper clippings.

Both sorts of code have value. AI code may be well-commented. It may use features effectively that a human might have missed. Just don't try to anthropomorphize an AI coder or a lawnmower, you'll end up inventing an intent that doesn't exist.

replies(2): >>43238390 #>>43246765 #
11. sunami-ai ◴[] No.43237808[source]
Worst part is that the patterns of implementation won't be consistent across the pieces. So debug a whole codebase that was authored with LLM generated code is like having to debug a codebase where ever function was written by a different developer and no one followed any standards. I guess you can specify the coding standards in the prompt and ask it to use FP-style programming only, but I'm not sure how well it can follow.
replies(1): >>43238272 #
12. kadushka ◴[] No.43237956[source]
I wouldn't be great at finding bugs in it or logical flaws

This is what tests are for.

replies(2): >>43238611 #>>43238682 #
13. QuiDortDine ◴[] No.43238272[source]
Not well, at least for ChatGPT. It can't follow my custom instructions which can be summed up as "follow PEP-8 and don't leave trailing whitespace".
replies(1): >>43239996 #
14. gunian ◴[] No.43238390{3}[source]
what if you

- generate - lint - format - fuzz - test - update

infintely?

replies(2): >>43238455 #>>43238776 #
15. sfink ◴[] No.43238455{4}[source]
Then you'll get code that passes the tests you generate, where "tests" includes whatever you feed the fuzzer to detect problems. (Just crashes? Timeouts? Comparison with a gold standard?)

Sorry, I'm failing to see your point.

Are you implying that the above is good enough, for a useful definition of good enough? I'm not disagreeing, and in fact that was my starting assumption in the message you're replying to.

Crap code can pass tests. Slow code can pass tests. Weird code can pass tests. Sometimes it's fine for code to be crap, slow, and/or weird. If that's your situation, then go ahead and use the code.

To expand on why someone might not want such code, think of your overall codebase as having a time budget, a complexity budget, a debuggability budget, an incoherence budget, and a maintenance budget. Yes, those overlap a bunch. A pile of AI-written code has a higher chance of exceeding some of those budgets than a human-written codebase would. Yes, there will be counterexamples. But humans will at least attempt to optimize for such things. AIs mostly won't. The AI-and-AI-using-human system will optimize for making it through your lint-fuzz-test cycle successfully and little else.

Different constraints, different outputs. Only you can decide whether the difference matters to you.

replies(2): >>43239125 #>>43251223 #
16. notepad0x90 ◴[] No.43238611[source]
The tests are probably LLM generated as well lol
17. nradov ◴[] No.43238682[source]
You can't test quality into a product.
18. ajmurmann ◴[] No.43238722[source]
To fight this I mostly do ping-pong pairing with llms. After e discuss the general goal and approach I usually write the first test. The llm the makes it pass and writes the next test which I'll make pass and so on. It forces me to stay 100% in the loop and understand everything. Maybe it's not as fast as having the llm write as much as possible but I think it's a worthwhile tradeoff.
19. zahlman ◴[] No.43238729[source]
The way you've written this comes across like the AI is influencing your writing style....
replies(2): >>43238745 #>>43240778 #
20. plxxyzs ◴[] No.43238745{3}[source]
Three bullet points, each with three sentences (ok last one has a semicolon instead) is a dead giveaway
replies(3): >>43238992 #>>43241015 #>>43241016 #
21. intended ◴[] No.43238763[source]
I think this is a great line: > My fear is that LLM generated code will look great to me, I won't understand it fully but it will work

This is a degree of humility that made the scenario we are in much clearer.

Our information environment got polluted by the lack of such humility. Rhetoric that sounded ‘right’ is used everywhere. If it looks like an Oxford Don, sounds like an Oxford Don, then it must be an academic. Thus it is believable, even if they are saying the Titanic isn’t sinking.

Verification is the heart of everything humanity does, our governance structures, our judicial systems, economic systems, academia, news, media - everything.

It’s a massive computation effort to figure out what the best ways to allocate resources given current information, allowing humans to create surplus and survive.

This is why we dislike monopolies, or manipulations of these markets - they create bad goods, and screw up our ability to verify what is real.

22. intended ◴[] No.43238776{4}[source]
Who has that much time and money when your boss is breathing down your neck?
23. raincole ◴[] No.43238789[source]
It happens all the time. Way before LLM. There were countless times I implemented an algorithm from a paper or a book while not fully understanding it (in other words, I can't prove the correctness or time complexity without referencing the original paper).
replies(1): >>43245703 #
24. happymellon ◴[] No.43238883[source]
> This is a great shortcut to achieve the previous point.

How does doing the hard part provide a shortcut for reviewing all the LLVM code?

If anything it's a long cut, because now you have to understand the code and write it yourself. This isn't great, it's terrible.

replies(1): >>43240806 #
25. eru ◴[] No.43238978[source]
> But since I didn't author it, I wouldn't be great at finding bugs in it or logical flaws.

Alas, I don't share your optimism about code I wrote myself. In fact, it's often harder to find flaws in my own code, then when reading someone else's code.

Especially if 'this is too complicated for me to review, please simplify' is allowed as a valid outcome of my review.

26. Jensson ◴[] No.43238992{4}[source]
Lots of people wrote like that before AI, AI writes like people its made to copy how people write. It wouldn't write like that if people didn't.
replies(1): >>43240250 #
27. pixelfarmer ◴[] No.43239125{5}[source]
> Then you'll get code that passes the tests you generate

Just recently I think here on HN there was a discussion about how neural networks optimize towards the goal they are given, which in this case means exactly what you wrote, including that the code will do stuff in wrong ways just to pass the given tests.

Where do the tests come from? Initially from a specification of what "that thing" is supposed to do and also not supposed to do. Everyone who had to deal with specifications in a serious way knows how insanely difficult it is to get these right, because there are often things unsaid, there are corner cases not covered and so on. So the problem of correctness is just shifted, and the assumption that this may require less time than actually coding ... I wouldn't bet on it.

Conceptually the idea should work, though.

28. otabdeveloper4 ◴[] No.43239372[source]
> ...but it will work

You don't know that though. There's no "it must work" criteria in the LLM training.

29. JimDabell ◴[] No.43239665[source]
> My fear is that LLM generated code will look great to me, I won't understand it fully but it will work.

If you don’t understand it, ask the LLM to explain it. If you fail to get an explanation that clarifies things, write the code yourself. Don’t blindly accept code you don’t understand.

This is part of what the author was getting at when they said that it’s surfacing existing problems not introducing new ones. Have you been approving PRs from human developers without understanding them? You shouldn’t be doing that. If an LLM subsequently comes along and you accept its code without understanding it too, that’s not a new problem the LLM introduced.

replies(2): >>43240848 #>>43241055 #
30. jampekka ◴[] No.43239996{3}[source]
In don't think they meant formatting details.
replies(2): >>43240256 #>>43241447 #
31. saagarjha ◴[] No.43240215[source]
What do you do when the LLM doesn't fix the code?
replies(1): >>43241221 #
32. johnisgood ◴[] No.43240250{5}[source]
Yes, I prefer using lists myself, too, does not mean my writing is being influenced by AI. I have always liked bullet points long before AI was even a thing, it is for better organization and visual clarity.
33. johnisgood ◴[] No.43240256{4}[source]
It is supposed to follow that instruction though. When it generates code, I can tell is to use tabs, 2 spaces, etc. and the generated code will use that. It works well with Claude, at least.
34. FiberBundle ◴[] No.43240520[source]
In my experience you just don't keep as good a map of the codebase in your head when you have LLMs write a large part of your codebase as when you write everything yourself. Having a really good map of the codebase in your head is what brings you large productivity boosts when maintaining the code. So while LLMs do give me a 20-30% productivity boost for the initial implementation, they bring huge disadvantages after that, and that's why I still mostly write code myself and use LLMs only as a stackoverflow alternative.
replies(2): >>43241511 #>>43242824 #
35. noisy_boy ◴[] No.43240778{3}[source]
thatistrue I us ed to write lik this b4 ai it has change my life
replies(1): >>43242127 #
36. noisy_boy ◴[] No.43240806{3}[source]
Sure whatever works for you; my approach works for me
replies(1): >>43241002 #
37. sarchertech ◴[] No.43240848[source]
No one takes the time to fully understand all the PRs they approve. And even when you do take the time to “fully understand” the code, it’s very easy for your brain to trick you into believing you understand it.

At least when a human wrote it, someone understood the reasoning.

replies(1): >>43241744 #
38. happymellon ◴[] No.43241002{4}[source]
But you don't explain how doing the hard part shortcuts needing to understand the LLVM code.
39. KronisLV ◴[] No.43241015{4}[source]
I feel like “looks like it’s written by AI” might become a critique of writing that’s very template-like, neutral, corporate. I don’t usually dislike it though, as long as the information is there.
40. noisy_boy ◴[] No.43241016{4}[source]
Three bullet points AND three sentences?!! Get outta here...
41. np- ◴[] No.43241055[source]
Code reviews with a human are a two way street. When I find code that is ambiguous I can ask the developer to clarify and either explain their justification or ask them to fix it before the code is approved. I don’t have to write it myself, and if the developer is simply talking in circles then I’d be able to escalate or reject—and this is a far less likely failure case to happen with a real trusted human than an LLM. “Write the code yourself” at that point is not viable for any non-trivial team project, as people have their own contexts to maintain and commitments/projects to deliver. It’s not the typing of the code that is the hard part which is the only real benefit of LLMs that they can type super fast, it’s fully understanding the problem space. Working with another trusted human is far far different from working with an LLM.
42. madeofpalk ◴[] No.43241112[source]
Do you not review code from your peers? Do you not search online and try to grok code from StackOverflow or documentation examples?

All of these can vary wildly in quality. Maybe its because I mostly use coding LLMs as either a research tool, or to write reasonably small and easy to follow chunks of code, but I find it no different than all of the other types of reading and understanding other people's code I already have to do.

43. amarcheschi ◴[] No.43241221{3}[source]
You tell it there's an error, and to fix the code (/s)
44. rsynnott ◴[] No.43241331[source]
I mean, depends what you mean by ‘work’. For instance, something which produces the correct output, and leaks memory, is that working? Something which produces the correct output, but takes a thousand times longer than it should; is that working? Something which produces output which looks superficially correct and passes basic tests, is that working?

‘Works for me’ isn’t actually _that_ useful a signal without serious qualification.

replies(2): >>43247005 #>>43275331 #
45. 6r17 ◴[] No.43241447{4}[source]
Formatting is like a dot on the i; there is 200 other small details that are just completely off putting to me : - naming conventions (ias are lazy and tent to use generic names with no meaning) such as "Glass" instead of "GlassProduct" ; - error management convention

But the most troublesome to me is that it is just "pissing" out code and has no after-tough about the problem it is solving or the person it is talking to.

The number of times I have to repeat myself just to get a stubborn answer with no discussion is alarming. It does not benefit my well-being and is annoying to work with except for a bunch of exploratory cases.

I believe LLM are actually the biggest data heist organized. We believe that those models will get better at solving their jobs but the reality is that we are just giving away code, knowledge, ideas at scale, correcting the model for free, and paying to be allowed to do so. And when we watch the 37% minimum hallucination rate, we can more easily understand that the actual tough comes from the human using it.

I'm not comfortable having to argue with a machine and have to explain to it what I'm doing, how, and why - just to get it to spam me with things I have to correct afterwards anyway.

The worst is, all that data is the best insight on everything. How many people ask for X ? How much time did they spend trying to do X ? What were they trying to achieve ? Who are their customers ? etc...

46. MrMcCall ◴[] No.43241511{3}[source]
The evolution of a codebase is an essential missing piece of our development processes. Barring detailed design docs that no one has time to write and then update, understanding that evolution is the key to understanding the design intent (the "why") of the codebase. Without that why, there will be no consistency, and less chance of success.

"Short cuts make long delays." --Tolkien

47. sgarland ◴[] No.43241744{3}[source]
> No one takes the time to fully understand all the PRs they approve.

I was appalled when I was being effusively thanked for catching some bugs in PRs. “No one really reads these,” is what I was told. Then why the hell do we have a required review?!

replies(1): >>43242162 #
48. matthberg ◴[] No.43242127{4}[source]
As someone pretty firmly in the anti-AI camp, I'm genuinely glad that you've personally found AI a useful tool to polish text and help you communicate.

I think that just because someone might be more or less eloquent than someone else, the value of their thoughts and contributions shouldn't be weighed any differently. In a way, AI formatting and grammar assistance could be a step towards a more equitable future, one where ideas are judged on inherent merits rather than superficial junk like spel;ng or idk typos n shi.t

However, I think what the parent commenter (and I) might be saying is that it seems you're relying on AI for more than just help expressing yourself—it seems you're relying on it to do the thinking too. I'd urge you to consider if that's what you really want from a tool you use. That said, I'm just some random preachy-judgy stranger on the internet, you don't owe me shit, lol

(Side notes I couldn't help but include: I think talking about AI and language is way more complicated (and fascinating) than just that aspect, including things I'm absolutely unqualified to comment on—discrimination against AAVE use, classism, and racism can't and shouldn't be addressed by a magic-wand spell-checker that "fixes" everyone's speech to be "correct" (as if a sole cultural hegemony or way of speech is somehow better than any other))

replies(1): >>43242277 #
49. sarchertech ◴[] No.43242162{4}[source]
Cargo culting.
50. noisy_boy ◴[] No.43242277{5}[source]
> As someone pretty firmly in the anti-AI camp, I'm genuinely glad that you've personally found AI a useful tool to polish text and help you communicate.

> I think that just because someone might be more or less eloquent than someone else, the value of their thoughts and contributions shouldn't be weighed any differently. In a way, AI formatting and grammar assistance could be a step towards a more equitable future, one where ideas are judged on inherent merits rather than superficial junk like spel;ng or idk typos n shi.t

I guess I must come clean that my reply was sarcasm which obviously fell flat and caused you to come to the defense of those who can't spell - I swear I don't have anything against them.

> However, I think what the parent commenter (and I) might be saying is that it seems you're relying on AI for more than just help expressing yourself—it seems you're relying on it to do the thinking too. I'd urge you to consider if that's what you really want from a tool you use. That said, I'm just some random preachy-judgy stranger on the internet, you don't owe me shit, lol

You and presumably the parent commenter have missed the main point of the retort - you are assuming I am relying on AI for my content or its style. It is neither - I like writing point-wise in a systematic manner, always have, always will - AI or no-AI be damned. It is the all-knowing veil-piercing eagle-eyed deduction of random preachy-judgy strangers on the internet about something being AI-generated/aided just because it follows structure, that is annoying.

replies(1): >>43242693 #
51. danielmarkbruce ◴[] No.43242693{6}[source]
It's funny that some folks seem to assume AI writing style just arrive out of thin air....
replies(1): >>43265898 #
52. simonw ◴[] No.43242824{3}[source]
I have enough projects that I'm responsible for now (over 200 packages on PyPI, over 800 GitHub repositories) that I gave up on keeping a map of my codebases in my head a long time ago - occasionally I'll stumble across projects I released that I don't even remember existing!

My solution for this is documentation, automated tests and sticking to the same conventions and libraries (like using Click for command line argument parsing) across as many projects as possible. It's essential that I can revisit a project and ramp up my mental model of how it works as quickly as possible.

I talked a bit more about this approach here: https://simonwillison.net/2022/Nov/26/productivity/

replies(1): >>43246595 #
53. fuzztester ◴[] No.43245703{3}[source]
imo, your last phrase, excerpted below:

>(in other words, I can't prove the correctness ... without referencing the original paper).

agrees with what I said in my previous comment:

>if you don't understand it fully, how can you say .... that it will work?

(irrelevant parts from our original comments above, replaced with ... , without loss of meaning to my argument.)

both those quoted fragments, yours and mine, mean basically the same thing, i.e. that both you and the GP don't know whether it will work.

it's not that one cannot use some piece of code without knowing whether it works; everybody does that all the time, from algorithm books for example, as you said.

54. FiberBundle ◴[] No.43246595{4}[source]
You're an extreme outlier. Most programmers work with 1-3 codebases probably. Obviously you can't keep 800 codebases in your head, and you have to settle for your approach out of necessity. I find it hard to believe you get anywhere close to the productivity benefits of having a good mental map of a codebase with just good documentation and extensive test coverage. I don't have any data on this, but from experience I'd say that people who really know a codebase can be 10-50x as fast at fixing bugs than those with only a mediocre familiarity.
55. fuzztester ◴[] No.43247005{3}[source]
exactly.

what you said just strengthens my argument.

56. upcoming-sesame ◴[] No.43247810[source]
You write a decent amount of tests
replies(1): >>43248938 #
57. fuzztester ◴[] No.43248938{3}[source]
Famous quote, read many years ago:

testing can prove the presence of errors, but not their absence.

https://www.google.com/search?q=quote+testing+can+prove+the+...

- said by Steve McConnell (author of Code Complete), Edsger Dijkstra, etc. ...

58. gunian ◴[] No.43251223{5}[source]
what if you thought of your codebase as something similar to human DNA and the LLM as nature and the entire process as some sort of evolutionary process? the fitness function would be no panics exceptions and latency instead of some random KPI or OKR pr who likes working with who or who made who laugh

it's what our lord and savior jesus christ uses for us humans if it is good for him its good enough for me. and surely google is not laying off 25k people because it believes humans are better than their LLMs :)

59. noisy_boy ◴[] No.43265898{7}[source]
Maybe LLM generated text was their first-contact with structured and systematic writing.
60. fuzztester ◴[] No.43275331{3}[source]
>‘Works for me’ isn’t actually _that_ useful a signal without serious qualification.

yes, and it sounds a bit like "works on my machine", a common cop-out which I am sure many of us have heard of.

google: works on my machine meme