Most active commenters
  • verdverm(4)
  • Disposal8433(3)

←back to thread

Agent Client Protocol (ACP)

(agentclientprotocol.com)
270 points vinhnx | 27 comments | | HN request time: 1.362s | source | bottom
Show context
mg ◴[] No.45074786[source]
I'm fine with treating AI like a human developer:

I ask AI to write a feature (or fix a bug, or do a refactoring) and then I read the commit. If the commit is not to my liking, I "git reset --hard", improve my prompt and ask the AI to do the task again.

I call this "prompt coding":

https://www.gibney.org/prompt_coding

This way, there is no interaction between my coding environment and the AI at all. Just like working with a human developer does not involve them doing anything in my editor.

replies(2): >>45074878 #>>45076374 #
1. Disposal8433 ◴[] No.45074878[source]
> Nowadays, it is better to write prompts

Very big doubt. AI can help for a few very specific tasks, but the hallucinations still happen, and making things up (especially APIs) is unacceptable.

replies(6): >>45074958 #>>45074999 #>>45075081 #>>45075111 #>>45079473 #>>45081297 #
2. NitpickLawyer ◴[] No.45074958[source]
> but the hallucinations still happen, and making things up (especially APIs) is unacceptable.

The new models are much better at reading the codebase first, and sticking to "use the APIs / libraries already included". Also, for new libraries there's context7 that brings in up-to-date docs. Again, newer models know how to use it (even gpt5-mini works fine with it).

replies(1): >>45075292 #
3. wongarsu ◴[] No.45074999[source]
In languages with strong compile-time checks (like say rust) the obvious problems can mostly be solved by having the agent try to compile the program as a last step, and most agents now do that on their own. In cases where that doesn't work (more permissive languages like python, or http APIs) you can have the AI write tests and execute them. Or ask the AI to prototype and test features separately before adding them to the codebase. Adding MCP servers with documentation also helps a ton.

The real issues I'm struggling with are more subtle, like unnecessary code duplication, code that seems useful but is never called, doing the right work but in the wrong place, security issues, performance issues, not implementing the prompt correctly when it's not straight forward, implementing the prompt verbatim when a closer inspection of the libraries and technologies used reveals a much better way, etc. Mostly things you will catch in code review if you really pay attention. But whether that's faster than doing the task yourself greatly depends on the task at hand

replies(2): >>45075560 #>>45084484 #
4. salomonk_mur ◴[] No.45075081[source]
Hard disagree. LLMs are now incredibly good for any coding task (with popular languages).
replies(2): >>45075488 #>>45075893 #
5. mg ◴[] No.45075111[source]
Do others here encounter that problem? I never do. I can't remember the last time I saw a hallucination in a commit.

Maybe it's because the libraries I use are made from small files which easily fit into the context window.

replies(1): >>45077019 #
6. sigseg1v ◴[] No.45075292[source]
What size of codebases are we talking here? I've had a lot of issues trying to do pretty much anything across a 1.7 million LOC codebase and generally found it faster to use traditional IDE functionalities.

I've had much more success with things under 20k LOC but that isn't the stuff that I really need any assistance with.

7. Disposal8433 ◴[] No.45075488[source]
You can't disagree with facts. Every time I try to give a chance to all those LLMs, they always use old APIs, APIs that don't exist, or mix things up. I'll still try that once a month to see how it evolves, but I have never been amazed by the capabilities of those things.

> with popular languages

Don't know, don't care. I write C++ code and that's all I need. JS and React can die a painful death for all I care as they have injected the worst practices across all the CS field. As for Python, I don't need help with that thanks to uv, but that's another story.

replies(3): >>45077226 #>>45079482 #>>45081303 #
8. Disposal8433 ◴[] No.45075560[source]
> the obvious problems can mostly be solved by having the agent try to compile the program

The famous "It compiles on my machine." Is that where engineering is going? Spending $billions to get the same result as the laziest developer ever?

replies(1): >>45075949 #
9. quotemstr ◴[] No.45075893[source]
What's your explanation for why others report difficulty getting coding agents to produce their desired results?

And don't respond with a childish "skill issue lol" like it's Twitter. What specific skill do you think people are lacking?

replies(3): >>45078123 #>>45079566 #>>45082881 #
10. wongarsu ◴[] No.45075949{3}[source]
If it compiles on my machine then the library and all called methods exist and are not hallucinated. If it runs on my machine then the called external APIs exist and are not hallucinated

That obviously does not mean that it's good software. That's why the rest of my comment exists. But "AI is hallucinating libraries/APIs" is something that can be trivially solved with good software practices from the 00s, and that the AI can resolve by itself using those techniques. It's annoying for autocomplete AI, but for agents it's a non-issue

11. brulard ◴[] No.45077019[source]
Same here, very low hallucination rate and it can pretty quickly correct itself (Claude Code). To force it to use recent versions of libraries instead of old ones, it's good to have it specifically required in CLAUDE.md and also having docs MCP (like context7) can help.
12. dingnuts ◴[] No.45077226{3}[source]
If you want them to not make shit up, you have to load up the context with exactly the docs and code references that the request needs. This is not a trivial process and ime it can take just as long as doing stuff manually a lot of the time, but tools are improving to aid this process and if the immediate context contains everything the model needs it won't hallucinate any worse than I do when I manually enter code (but when I do it, I call it a typo)

there is a learning curve, it reminds me of learning to use Google a long time ago

replies(1): >>45078545 #
13. Eisenstein ◴[] No.45078123{3}[source]
Thought experiment: you can ride a bike. You can see other people ride bikes. Some portion of people get on a bike and fall off, then claim that bikes are not useful for transportation. Specify what skill they are lacking without saying 'ability to ride a bike'.
replies(1): >>45078257 #
14. quotemstr ◴[] No.45078257{4}[source]
For a bike? Balance, fine motor control, proprioception, or even motivation. You can always break it down.
replies(1): >>45079852 #
15. th0ma5 ◴[] No.45078545{4}[source]
So, I've done this, I've pasted in the headers and pleaded with it to not imagine ABIs that don't exist, and multiple models just want to make it work however they can. People shouldn't be so quick to reply like this, many people have tried all this advice... It also doesn't help that there is no independent test that can describe these issues, so all there is anecdote to use a different vendor or that the person must be doing something wrong? How can we talk about these things with these rhetorical reflexes?
replies(1): >>45079494 #
16. verdverm ◴[] No.45079473[source]
Writing prompts makes these issues way less significant and makes the agents way more capable.

Prompt / context engineering is still an underrated and underutilized activity (imo)

17. verdverm ◴[] No.45079482{3}[source]
> You can't disagree with facts. Every time I...

Anecdotes are not facts, they are personal experiences, which we know are not equal and often come with biases

18. verdverm ◴[] No.45079494{5}[source]
There is a significant gap between agents and models.

Agents use multiple models, can interact with the environment, and take many steps. You can get them to reflect on what they have done and what they need to do to continue, without intervention. One of the more important things they can do is understand their environment, the libraries and versions in use, fetch or read the docs, and then base their edits on those. Much of the hallucinating SDKs can be removed with this, and with running compile to validate, they get even better.

Models typically operate in a turn-by-turn basis with only the context and messages the user provides.

replies(1): >>45084537 #
19. 80hd ◴[] No.45079566{3}[source]
Not OP but my two cents - probably laziness and propensity towards daydreaming.

I have extreme intolerance to boredom. I can't do the same job twice. Some people don't care.

This pain has caused me to become incredibly effective with LLMs because I'm always looking for an easier way to do anything.

If you keep hammering away at a problem - i.e. how to code with LLMs - you tend to become dramatically better at other people who don't do that.

20. Eisenstein ◴[] No.45079852{5}[source]
Knowing those things won't help them acquire the skill. What will help them be able to ride a bike is practicing trying to riding a bike until they can do it.
21. tomjen3 ◴[] No.45081297[source]
Its surprisingly fine, as long as you allow the AI to iterate on its work. It will discover that it doesn't compile, and then maybe lookup the API and then it will most often fix it and move on.

AI is no more capable of reliably one shotting solutions that you are.

22. tomjen3 ◴[] No.45081303{3}[source]
Add "Look up version 4 of the library, make sure to use that version".

My Python work has to be told we are using uv, and sometimes that I am on a mac. This is not that different to what you would have to tell another programmer, not familiar with you tools.

23. kevinmchugh ◴[] No.45082881{3}[source]
In no particular order: LLMs seem, for some reason, to be worse at some languages than others.

LLMs only have so much context available, so larger projects are harder to get good results in.

Some tools (eg a fast compiler) are very useful to agents to get good feedback. If you don't have a compiler, you'll get hallucinations corrected more slowly.

Some people have schedules that facilitate long uninterrupted periods, so they see an agent work for twenty minutes on a task and think "well I could've done that in 10-30 minutes, so where's the gain?". And those people haven't understood that they could be running many agents in parallel (I don't blame people for not realizing this, no one I talk to is doing this at work).

People also don't realize they could have the agent working while they're asleep/eating lunch/in a meeting. This is why, in my experience, managers find agents more transformative than ICs do. We're in more meetings, with fewer uninterrupted periods.

People have an expectation that the agent will always one-shot the implementation, and don't appreciate it when the agent gets them 80% of the way there. Or that, it's basically free to try again if the agent went completely off the rails.

A lot of people don't understand that agents are a step beyond just an LLM, so their attempts last year have colored their expectations.

Some people are less willing to attempt to work with the agent to make it better at producing good output. They don't know how to do it. Your agent got logging wrong? Okay, tell it to read an example of good logging and to write a rule that will get it correct.

24. lsaferite ◴[] No.45084484[source]
The subtle bugs are horrible.

We used Claude Code the other day to add a new record type to an API and it was mostly right. CC decided (for some weird reason) to use a slightly different return shape on a list endpoint than the entire rest of the API. It changed two field names (count/items became total_count/data). This divergence was missed until the code was released because it 'worked' and had full tests and everything. But when the standard client lib code was used to access the API it failed on the list endpoint. Didn't take long to discover the issue. Luckily, it was a new feature so nothing broke, but it was a very clear reminder that you have to be very thorough when reviewing coding agent PRs.

FWIW, I use CC frequently and have mostly positive things to say about it as a tool.

replies(1): >>45084563 #
25. th0ma5 ◴[] No.45084537{6}[source]
You can't make any guarantees and manually watching everything is not tenable. "Much" instead of "all" means having to check it all because "much" is random.
replies(1): >>45085482 #
26. ◴[] No.45084563{3}[source]
27. verdverm ◴[] No.45085482{7}[source]
You don't have to watch it like you don't have to watch your peers. We have code review processes in place already

You're never going to get all, you don't have all today. Humans make mistakes too and have to run programs to discover their errors