←back to thread

What to build instead of AI agents

(decodingml.substack.com)
233 points giuliomagnifico | 1 comments | | HN request time: 0.435s | source
Show context
mccoyb ◴[] No.44450552[source]
Building agents has been fun for me, but it's clear that there are serious problems with "context engineering" that must be overcome with new ideas. In particular, no matter how big the context window size is increased - one must curate what the agent sees: agents don't have very effective filters on what is relevant to supercharge them on tasks, and so (a) you must leave *.md files strewn about to help guide them and (b) you must put them into roles. The *.md system is essentially a rudimentary memory system, but it could get be made significantly more robust, and could involve e.g. constructing programs and models (in natural language) on the fly, guided by interactions with the user.

What Claude Code has taught me is that steering an agent via a test suite is an extremely powerful reinforcement mechanism (the feedback loop leads to success, most of the time) -- and I'm hopeful that new thinking will extend this into the other "soft skills" that an agent needs to become an increasingly effective collaborator.

replies(4): >>44450945 #>>44451021 #>>44452834 #>>44453646 #
moritz64 ◴[] No.44452834[source]
> steering an agent via a test suite is an extremely powerful reinforcement mechanism

can you elaborate a bit? how do you proceed? what does your process look like?

replies(1): >>44454041 #
1. mccoyb ◴[] No.44454041[source]
I spend a significant amount of time (a) curating the test suite, and making sure it matches my notion of correctness and (b) forcing the agent to make PNG visuals (which Claude Code can see, by the way, and presumably also Gemini CLI, and maybe Aider?, etc)

I'd have to do this anyways, if I was writing the code myself, so this is not "time above what I'd normally spend"

The visuals it makes for me I can inspect and easily tell if it is on the right path, or wrong. The test suite is a sharper notion of "this is right, this is wrong" -- more sharp than just visual feedback and my directions.

The basic idea is to setup a feedback loop for the agent, and then keep the agent in the loop, and observe what it is doing. The visuals are absolutely critical -- as a compressed representation of the behavior of the codebase, which I can quickly and easily parse and recognize if there are issues.