What to build instead of AI agents

(decodingml.substack.com)

233 points giuliomagnifico | 1 comments | 03 Jul 25 00:02 UTC | HN request time: 0s | source

Show context

mccoyb ◴[03 Jul 25 01:03 UTC] No.44450552[source]▶

Building agents has been fun for me, but it's clear that there are serious problems with "context engineering" that must be overcome with new ideas. In particular, no matter how big the context window size is increased - one must curate what the agent sees: agents don't have very effective filters on what is relevant to supercharge them on tasks, and so (a) you must leave *.md files strewn about to help guide them and (b) you must put them into roles. The *.md system is essentially a rudimentary memory system, but it could get be made significantly more robust, and could involve e.g. constructing programs and models (in natural language) on the fly, guided by interactions with the user.

What Claude Code has taught me is that steering an agent via a test suite is an extremely powerful reinforcement mechanism (the feedback loop leads to success, most of the time) -- and I'm hopeful that new thinking will extend this into the other "soft skills" that an agent needs to become an increasingly effective collaborator.

replies(4): >>44450945 #>>44451021 #>>44452834 #>>44453646 #

zmgsabst ◴[03 Jul 25 02:06 UTC] No.44450945[source]▶

>>44450552 #

I’ve found managing the context is most of the challenge:

- creating the right context for parallel and recursive tasks;

- removing some steps (eg, editing its previous response) to show only the corrected output;

- showing it its own output as my comment, when I want a response;

Etc.

replies(2): >>44451001 #>>44451616 #

mccoyb ◴[03 Jul 25 02:20 UTC] No.44451001[source]▶

>>44450945 #

I've also found that relying on agents to build their own context _poisons_ it ... that it's necessary to curate it constantly. There's kind of a <1 multiplicative thing going on, where I can ask the agent to e.g. update CLAUDE.mds or TODO.mds in a somewhat precise way, and the agent will multiply my request in a lot of changes which (on the surface) appear well and good ... but if I repeat this process a number of times _without manual curation of the text_, I end up with "lower quality" than I started with (assuming I wrote the initial CLAUDE.md).

Obvious: while the agent can multiply the amount of work I can do, there's a multiplicative reduction in quality, which means I need to account for that (I have to add "time doing curation")

replies(1): >>44451472 #

1. prmph ◴[03 Jul 25 03:50 UTC] No.44451472[source]▶

>>44451001 #

In other words, the old adage still applies: there is no free lunch.

More seriously, yes it makes sense that LLMs are not going to be able to take humans entirely out of the loop. Think about what it would mean if that were the case: if people, on the basis of a few simple prompts could let the agents loose and create sophisticated systems without any further input, the there would be nothing to differentiate those systems, and thus they would lose their meaning and value.

If prompting is indeed the new level of abstraction we are working at, then what value is added by asking Claude: make me a note-taking app? A million other people could also issue this same low-effort prompt; thus what is the value added here by the prompter?

↑