Building Effective "Agents"

1. brotchie ◴[20 Dec 24 22:25 UTC] No.42475699[source]▶

Have been building agents for past 2 years, my tl;dr is that:

Agents are Interfaces, Not Implementations

The current zeitgeist seems to think of agents as passthrough agents: e.g. a lite wrapper around a core that's almost 100% a LLM.

The most effective agents I've seen, and have built, are largely traditional software engineering with a sprinkling of LLM calls for "LLM hard" problems. LLM hard problems are problems that can ONLY be solved by application of an LLM (creative writing, text synthesis, intelligent decision making). Leave all the problems that are amenable to decades of software engineering best practice to good old deterministic code.

I've been calling system like this "Transitional Software Design." That is, they're mostly a traditional software application under the hood (deterministic, well structured code, separation of concerns) with judicious use of LLMs where required.

Ultimately, users care about what the agent does, not how it does it.

The biggest differentiator I've seen between agents that work and get adoption, and those that are eternally in a demo phase, is related to the cardinality of the state space the agent is operating in. Too many folks try and "boil the ocean" and try and implement a generic purpose capability: e.g. Generate Python code to do something, or synthesizing SQL based on natural language.

The projects I've seen that work really focus on reducing the state space of agent decision making down to the smallest possible set that delivers user value.

e.g. Rather than generating arbitrary SQL, work out a set of ~20 SQL templates that are hyper-specific to the business problem you're solving. Parameterize them with the options for select, filter, group by, order by, and the subset of aggregate operations that are relevant. Then let the agent chose the right template + parameters from a relatively small finite set of options.

^^^ the delta in agent quality between "boiling the ocean" vs "agent's free choice over a small state space" is night and day. It lets you deploy early, deliver value, and start getting user feedback.

Building Transitional Software Systems:

  1. Deeply understand the domain and CUJs,
  2. Segment out the system into "problems that traditional software is good at solving" and "LLM-hard problems",
  3. For the LLM hard problems, work out the smallest possible state space of decision making,
  4. Build the system, and get users using it,
  5. Gradually expand the state space as feedback flows in from users.

replies(5): >>42475906 #>>42476199 #>>42476710 #>>42478819 #>>42480366 #

2. samdjstephens ◴[20 Dec 24 22:51 UTC] No.42475906[source]▶

>>42475699 (TP) #

There’ll always be an advantage for those who understand the problem they’re solving for sure.

The balance of traditional software components and LLM driven components in a system is an interesting topic - I wonder how the capabilities of future generations of foundation model will change that?

replies(1): >>42476349 #

3. CharlieDigital ◴[20 Dec 24 23:31 UTC] No.42476199[source]▶

>>42475699 (TP) #

Same experience.

The smaller and more focused the context, the higher the consistency of output, and the lower the chance of jank.

Fundamentally no different than giving instructions to a junior dev. Be more specific -- point them to the right docs, distill the requirements, identify the relevant areas of the source -- to get good output.

My last attempt at a workflow of agents was at the 3.5 to 4 transition and OpenAI wasn't good enough at that point to produce consistently good output and was slow to boot.

My team has taken the stance that getting consistently good output from LLMs is really an ETL exercise: acquire, aggregate, and transform the minimum relevant data for the output to reach the desired level of quality and depth and let the LLM do it's thing.

4. brotchie ◴[20 Dec 24 23:57 UTC] No.42476349[source]▶

>>42475906 #

Certain the end state is "one model to rule them all" hence the "transitional."

Just that the pragmatic approach, today, given current LLM capabilities, is to minimize the surface area / state space that the LLM is actuating. And then gradually expand that until the whole system is just a passthrough. But starting with a passthrough kinda doesn't lead to great products in December 2024.

5. handfuloflight ◴[21 Dec 24 01:12 UTC] No.42476710[source]▶

>>42475699 (TP) #

When trying to do everything, they end up doing nothing.

6. shinryuu ◴[21 Dec 24 10:36 UTC] No.42478819[source]▶

>>42475699 (TP) #

Do you have a public example of a good agentic system. I would like to experience it.

7. throw83288 ◴[21 Dec 24 15:57 UTC] No.42480366[source]▶

>>42475699 (TP) #

Unrelated, but since you seem to have experience here, how would you recommend getting into the bleeding edge of LLMs/Agents? Traditional SWE is obviously on it's way out, but I can't even tell where to start with this new tech and struggle to find ways to apply them to an actual project.