←back to thread

548 points kmelve | 1 comments | | HN request time: 0s | source
Show context
rhubarbtree ◴[] No.45112846[source]
Does anyone have a link to a video that uses Claude Code to produce clean robust code that solves a non trivial problem (ie not tic tac toe or a landing page) more quickly than a human programmer can write? I don’t want a “demo”, I want a livestream from an independent programmer unaffiliated with any AI company and thus not incentivised to hype.

I want the code to have subsequently been deployed in production and demonstrably robust, without additional work outside of the livestream.

The livestream should include code review, test creation, testing, PR creation.

It should not be on a greenfield project, because nearly all coding is not.

I want to use Claude and I want to be more productive, but my experience to date is that for writing code beyond autocomplete AI is not good enough and leads to low quality code that can’t be maintained, or else requires so much hand holding that it is actually less efficient than a good programmer.

There are lots of incentives for marketing at the grassroots level. I am totally open to changing my mind but I need evidence.

replies(27): >>45112915 #>>45112951 #>>45112960 #>>45112964 #>>45112968 #>>45112985 #>>45112994 #>>45113041 #>>45113054 #>>45113123 #>>45113184 #>>45113229 #>>45113316 #>>45113448 #>>45113465 #>>45113643 #>>45113677 #>>45113802 #>>45114193 #>>45114454 #>>45114485 #>>45114519 #>>45115642 #>>45115900 #>>45116522 #>>45123605 #>>45125152 #
coffeeri ◴[] No.45113229[source]
This video [0] is relevant, though it actually supports your point - it shows Claude Code struggling with non-trivial tasks and needing significant hand-holding.

I suspect videos meeting your criteria are rare because most AI coding demos either cherry-pick simple problems or skip the messy reality of maintaining real codebases.

[0] https://www.youtube.com/watch?v=EL7Au1tzNxE

replies(2): >>45113491 #>>45114057 #
thecupisblue ◴[] No.45113491[source]
Great video! Even more, shows a few things - how good it is with such a niche language but also exposes some direct flaws.

First off, Rust represents quite a small part of the training dataset (last I checked it was under 1% of code dataset) in most public sets, so it's got waaay less training then other languages like TS or Java. You added 2 solid features, backed with tests and documentation and nice commit messages. 80% of devs would not deliver this in 2.5 hours.

Second, there was a lot of time/token waste messing around with git and git messages. Few tips I noticed that could help you in the workflow:

#1: Add a subagent for git that knows your style, so you don't poison direct claude context and spend less tokens/time fighting it.

#2: Claude has hooks, if your favorite language has a formatter like rust fmt, just use hooks to run rust fmt and similar.

#3: Limit what they test, as most LLM models tend to write overeager tests, including testing if "the field you set as null is null", wasting tokens.

#5: Saying "max 50 characters title" doesn't really mean anything to the LLM. They have no inherent ability to count, so you are relying on probability, which is quite low since your context is quite filled at this point. If they want to count the line length, they also have to use external tools. This is an inherent LLM design issue and discussing it with an LLM doesn't get you anywhere really.

replies(2): >>45113925 #>>45114157 #
komali2 ◴[] No.45113925{3}[source]
> #1: Add a subagent for git that knows your style, so you don't poison direct claude context and spend less tokens/time fighting it.

I've not heard of this for, what does this mean practically? Some kind of invocation in claude? Opening another claude window?

replies(3): >>45114075 #>>45114481 #>>45124066 #
1. thecupisblue ◴[] No.45114481{4}[source]
Oh you're about to unlock a whole new level of token burning. There is an /agents command that lets you define agents for specific tasks or areas. Each of them has their own context and their own rules.

Then claude can delegate the work to them when appropriate, or you can tell it directly to use the subagent, i.e. a subagent for your frontend, backend, specific microservice, database, etc etc.

Quite depends on your workflow which ones you create/need, but they are a really nice quality of life change.