thanks for the article, it's a good one
thanks for the article, it's a good one
I'd rather use it the other way, I'm the one in charge, and the AI reviews any logical flaw or things that I would have missed. I don't even have to think about context window since it'll only look at my new code logic.
So yeah, 3 years after the first ChatGPT and Copilot, I don't feel huge changes regarding "automated" AI programming, and I don't have any AI tool in my IDE, I pefer to have a chat using their website, to brainstorm, or occasionally find a solution to something I'm stuck on.
yes, just as was said each and every previous time OpenAI/anthropic shit out a new model
"now it doesn't suck!"
They know that its a significant, but not revolutionary improvement.
If you supervise and manage your agents closely on well scoped (small) tasks they are pretty handy.
If you need a prototype and don't care about code quality or maintenance, they are great.
Anyone claiming 2x, 5x, 10x etc is absolutely kidding themselves for any non-trivial software.
Impressively, it recognized the structure of the code and correctly identified it as a component of an audio codec library, and provided a reasonably complete description of many minute details specific to this codec and the work that the function was doing.
Rather less impressively, it decided to ignore my request and write a function that used C++ features throughout, such as type inference and lambdas, or should I say "lambdas" because it was actually just a function-defined-within-a-function that tried to access and mutate variables outside of its own function scope, like we were writing Javascript or something. Even apart from that, the code was rife with the sorts of warnings that even a default invocation of gcc would flag.
I can see why people would be wowed by this on its face. I wouldn't expect any average developer to have such a depth of knowledge and breadth of pattern-matching ability to be able to identify the specific task that this specific function in this specific audio codec was performing.
At the same time, this is clearly not a tool that's suitable for letting loose on a codebase without EXTREME supervision. This was a fresh session (no prior context to confuse it) using a tightly crafted prompt (a small, self-contained C program doing one thing) with a clear goal, and it still required constant handholding.
At the end of the day, I got the code working by editing it manually, but in an honest retrospective I would have to admit that the overall process actually didn't save me any time at all.
Ironically, despite how they're sold, these tools are infinitely better at going from code to English than going the other way around.
The hedonic treadmill ensures it feels the same way each time.
But that doesn’t mean the models aren’t improving, nor that the scope isn’t expanding. If you compare today’s tools to those a year ago, the difference is stark.
Brainstorming, ideation and small, well defined tasks where I can quickly vet the solution : these feel like the sweet spot for current frontier model capabilities.
(Unless you are pumping out some sloppy React SPA that you don't care about anything except get it working as fast as possible - fine, get Claude code to one shot it)
Just two questions, if you don’t mind satisfying my curiosity.
- Did you tell it to write C? Or better yet, what was the prompt? You can use Claude --resume to easily find that.
- Which model? (Sooner or Opus)? Though I’d have expected either one to work.
It's good enough that it helps, particularly in areas or languages that I'm unfamiliar with. But I'm constantly fighting with it.
There's a big difference with their benchmarks and real world coding.
Compared to just doing it yourself though?
Imagine having to micromanage a junior developer like this to get good results
Ridiculous tbh
Yes. Decently useful (and reasonably safe) to red team yourself with. But extremely easy to red queen yourself otherwise.
It takes all of five minutes to have it run and at the end I can review it, if it's small ask it to execute, and if it actually requires me to work it myself well now I have a reference with line numbers, some comments on how the system appears to work, what the intent is, areas of interest, etc..
I also rely heavily on the sequential thinking MCP server to give it more structure.
Edit:
I will say because I think it's important I've been a senior dev for a while now, a lot of my job _is_ reviewing other people's pull requests. I don't find it hard or tedious at all.
Honestly it's a lot easier to review a few small "PRs" as the agent works than some of the giant PRs I'd get from team members before.
I kind of hate that I'm saying this, but I'm sort of similar and one thing I really like is having zero guilt about trashing the LLM's code. So often people are submitting something and the code is OK but just pervasively not quite how I like it. Some staff will engage in micro arguments about things rather than just doing them how I want and it's just tiring. Then LLMs are really good at explaining why they did stuff (or simulating that) as well. LLMs will enthusiastically redo something and then help adjust their own AGENTS.md file to align better in the future.