←back to thread

548 points kmelve | 3 comments | | HN request time: 0.001s | source
Show context
rhubarbtree ◴[] No.45112846[source]
Does anyone have a link to a video that uses Claude Code to produce clean robust code that solves a non trivial problem (ie not tic tac toe or a landing page) more quickly than a human programmer can write? I don’t want a “demo”, I want a livestream from an independent programmer unaffiliated with any AI company and thus not incentivised to hype.

I want the code to have subsequently been deployed in production and demonstrably robust, without additional work outside of the livestream.

The livestream should include code review, test creation, testing, PR creation.

It should not be on a greenfield project, because nearly all coding is not.

I want to use Claude and I want to be more productive, but my experience to date is that for writing code beyond autocomplete AI is not good enough and leads to low quality code that can’t be maintained, or else requires so much hand holding that it is actually less efficient than a good programmer.

There are lots of incentives for marketing at the grassroots level. I am totally open to changing my mind but I need evidence.

replies(27): >>45112915 #>>45112951 #>>45112960 #>>45112964 #>>45112968 #>>45112985 #>>45112994 #>>45113041 #>>45113054 #>>45113123 #>>45113184 #>>45113229 #>>45113316 #>>45113448 #>>45113465 #>>45113643 #>>45113677 #>>45113802 #>>45114193 #>>45114454 #>>45114485 #>>45114519 #>>45115642 #>>45115900 #>>45116522 #>>45123605 #>>45125152 #
MontyCarloHall ◴[] No.45114454[source]
Forget a livestream, I want to hear from maintainers of complex, actively developed, and widely used open-source projects (e.g. ffmpeg, curl, openssh, sqlite). Highly capable coding LLMs have been out for long enough that if they do indeed have meaningful impact on writing non-trivial, non-greenfield/boilerplate code, it ought to be clearly apparent in an uptick of positive contributions to projects like these.
replies(3): >>45114495 #>>45115514 #>>45120070 #
1. stitched2gethr ◴[] No.45115514[source]
This contains some specific data with pretty graphs: https://youtu.be/tbDDYKRFjhk?t=623

But if you do professional development and use something like Claude Code (the current standard, IMO) you'll quickly get a handle on what it's good at and what it isn't. I think it took me about 3-4 weeks of working with it at an overall 0x gain to realize what it's going to help me with and what it will make take longer.

replies(2): >>45116205 #>>45139157 #
2. MontyCarloHall ◴[] No.45116205[source]
This is a great conference talk, thanks for sharing!

To summarize, the authors enlisted a panel of expert developers to review the quality of various pull requests, in terms of architecture, readability, maintainability, etc. (see 8:27 in the video for a partial list of criteria), and then somehow aggregate these criteria into an overall "productivity score." They then trained a model on the judgments of the expert developers, and found that their model had a high correlation with the experts' judgment. Finally, they applied this model to PRs across thousands of codebases, with knowledge of whether the PR was AI-assisted or not.

They found a 35-40% productivity gain for easy/greenfield tasks, 10-15% for hard/greenfield tasks, 15-20% for easy/brownfield tasks, and 0-10% for hard/brownfield tasks. Most productivity gains went towards "reworked" code, i.e. refactoring of recent code.

All in all, this is a great attempt at rigorously quantifying AI impact. However, I do take one major issue with it. Let's assume that their "productivity score" does indeed capture the overall quality of a PR (big assumption). I'm not sure this measures the overall net positive/negative impact to the codebase. Just because a PR is well-written according to a panel of expert engineers doesn't mean it's valuable to the project as a whole. Plenty of well-written code is utterly superfluous (trivial object setters/getters come to mind). Conversely, code that might appear poorly written to an outsider expert engineer could be essential to the project (the highly optimized C/assembly code of ffmpeg comes to mind, or to use an extreme example, anything from Arthur Whitney). "Reworking" that code to be "better written" would be hugely detrimental, even though the judgment of an outside observer (and an AI trained on it) might conclude that said code is terrible.

3. oblio ◴[] No.45139157[source]
Thank you for sharing the video, it's great! Puts into words a lot of things I was thinking.