How to build a coding agent

(ghuntley.com)

Show context

cryptoz ◴[24 Aug 25 05:40 UTC] No.45001692[source]▶

I really think the current trend of CLI coding agents isn't going to be the future. They're cool but they are _too simple_. Gemini CLI often makes incorrect edits and gets confused, at least on my codebase. Just like ChatGPT would do in a longer chat where the context gets lost: random, unnecessary and often harmful edits are made confidently. Extraneous parts of the codebase are modified when you didn't ask for it. They get stuck in loops for an hour trying to solve a problem, "solving it", and then you have to tell the LLM the problem isn't solved, the error message is the same, etc.

I think the future will be dashboards/HUDs (there was an article on HN about this a bit ago and I agree). You'll get preview windows, dynamic action buttons, a kanban board, status updates, and still the ability to edit code yourself, of course.

The single-file lineup of agentic actions with user input, in a terminal chat UI, just isn't gonna cut it for more complicated problems. You need faster error reporting from multiple sources, you need to be able to correct the LLM and break it out of error loops. You won't want to be at the terminal even though it feels comfortable because it's just the wrong HCI tool for more complicated tasks. Can you tell I really dislike using these overly-simple agents?

You'll get a much better result with a dashboard/HUD. The future of agents is that multiple of them will be working at once on the codebase and they'll be good enough that you'll want more of a status-update-confirm loop than an agentic code editing tool update.

Also required is better code editing. You want to avoid the LLM making changes in your code unrelated to the requested problem. Gemini CLI often does a 'grep' for keywords in your prompt to find the right file, but your prompt was casual and doesn't contain the right keywords so you end up with the agent making changes that aren't intended.

Obviously I am working in this space so that's where my opinions come from. I have a prototype HUD-style webapp builder agent that is online right now if you'd like to check it out:

https://codeplusequalsai.com/

It's not got everything I said above - it's a work-in-progress. Would love any feedback you have on my take on a more complicated, involved, and narrow-focus agentic workflow. It only builds flask webapps right now, strict limits on what it can do (no cron etc yet) but it does have a database you can use in your projects. I put a lot of work into the error flow as well, as that seems like the biggest issue with a lot of agentic code tools.

One last technical note: I blogged about using AST transformations when getting LLMs to modify code. I think that using diffs or rewriting the whole file isn't the right solution either. I think that having the LLM write code that modifies your code and then running that code to affect the modifications is the way forward. We'll see I guess. Blog post: https://codeplusequalsai.com/static/blog/prompting_llms_to_m...

replies(2): >>45001711 #>>45001959 #

1. faangguyindia ◴[24 Aug 25 05:45 UTC] No.45001711[source]▶

>>45001692 #

>Gemini CLI often makes incorrect edits and gets confused

Gemini CLI still uses archaic whole file format for edits, it's not a good representative of current state of coding agents.

replies(2): >>45001721 #>>45002398 #

2. cryptoz ◴[24 Aug 25 05:48 UTC] No.45001721[source]▶

>>45001711 (TP) #

Oh that's wild, I did suspect that but didn't know it outright. Mind-blowing Google would release that kind of thing, I had wondered why it sucked so much haha. Okay so what is a good representation of the current state of coding agents? Which one should I try that does a better job at code modifications?

replies(2): >>45001756 #>>45001817 #

3. mrugge ◴[24 Aug 25 05:55 UTC] No.45001756[source]▶

>>45001721 #

claude code (with max subscription), cursor-agent (with usage based pricing)

4. NitpickLawyer ◴[24 Aug 25 06:14 UTC] No.45001817[source]▶

>>45001721 #

Claude code is the strongest atm, but roocode or cline (vscode extensions) can also work well. Roo with gpt5-mini (so cheap, pretty fast) does diff based edits w/ good coordination over a task, and finishes most tasks that I tried. It even calls them "surgical diffs" :D

5. lifthrasiir ◴[24 Aug 25 08:21 UTC] No.45002398[source]▶

>>45001711 (TP) #

I'm not sure what do you mean by "whole file format", but if it refers to the write_file tool that overwrites the whole file, there is also the replace tool which is apparently inspired by a blog post [1] by Anthropic. It seems that Claude Code also supports the roughly identical tool (inferred from error messages), so editing tools can't be the reason why Claude Code is good.

[1] https://www.anthropic.com/engineering/swe-bench-sonnet

replies(1): >>45003034 #

6. faangguyindia ◴[24 Aug 25 10:27 UTC] No.45003034[source]▶

>>45002398 #

Many agents can send diffs. Whole file reading and writing burns tokens and pollutes context.

replies(1): >>45003515 #

7. lifthrasiir ◴[24 Aug 25 11:51 UTC] No.45003515{3}[source]▶

>>45003034 #

The replace tool is a form of diff (although it's rudimentary), and the read_file tool can be called with line ranges. I do wish robust patching but it is not the "whole" file reading/writing. Maybe you wanted to say about subagent file handling? I can agree then.

(Also I think Gemini is significantly better when it comes to the context rot, in my experience 100K--300K tokens were required for symptoms to appear. So burning tokens is less problematic with Gemini.)

↑