Building Effective "Agents"

Have been building agents for past 2 years, my tl;dr is that:

Agents are Interfaces, Not Implementations

The current zeitgeist seems to think of agents as passthrough agents: e.g. a lite wrapper around a core that's almost 100% a LLM.

The most effective agents I've seen, and have built, are largely traditional software engineering with a sprinkling of LLM calls for "LLM hard" problems. LLM hard problems are problems that can ONLY be solved by application of an LLM (creative writing, text synthesis, intelligent decision making). Leave all the problems that are amenable to decades of software engineering best practice to good old deterministic code.

I've been calling system like this "Transitional Software Design." That is, they're mostly a traditional software application under the hood (deterministic, well structured code, separation of concerns) with judicious use of LLMs where required.

Ultimately, users care about what the agent does, not how it does it.

The biggest differentiator I've seen between agents that work and get adoption, and those that are eternally in a demo phase, is related to the cardinality of the state space the agent is operating in. Too many folks try and "boil the ocean" and try and implement a generic purpose capability: e.g. Generate Python code to do something, or synthesizing SQL based on natural language.

The projects I've seen that work really focus on reducing the state space of agent decision making down to the smallest possible set that delivers user value.

e.g. Rather than generating arbitrary SQL, work out a set of ~20 SQL templates that are hyper-specific to the business problem you're solving. Parameterize them with the options for select, filter, group by, order by, and the subset of aggregate operations that are relevant. Then let the agent chose the right template + parameters from a relatively small finite set of options.

^^^ the delta in agent quality between "boiling the ocean" vs "agent's free choice over a small state space" is night and day. It lets you deploy early, deliver value, and start getting user feedback.

Building Transitional Software Systems:

  1. Deeply understand the domain and CUJs,
  2. Segment out the system into "problems that traditional software is good at solving" and "LLM-hard problems",
  3. For the LLM hard problems, work out the smallest possible state space of decision making,
  4. Build the system, and get users using it,
  5. Gradually expand the state space as feedback flows in from users.

Same experience.

The smaller and more focused the context, the higher the consistency of output, and the lower the chance of jank.

Fundamentally no different than giving instructions to a junior dev. Be more specific -- point them to the right docs, distill the requirements, identify the relevant areas of the source -- to get good output.

My last attempt at a workflow of agents was at the 3.5 to 4 transition and OpenAI wasn't good enough at that point to produce consistently good output and was slow to boot.

My team has taken the stance that getting consistently good output from LLMs is really an ETL exercise: acquire, aggregate, and transform the minimum relevant data for the output to reach the desired level of quality and depth and let the LLM do it's thing.