Show HN: Muscle-Mem, a behavior cache for AI agents

(github.com)

226 points edunteman | 1 comments | 14 May 25 19:38 UTC | HN request time: 0.212s | source

Hi HN! Erik here from Pig.dev, and today I'd like to share a new project we've just open sourced:

Muscle Mem is an SDK that records your agent's tool-calling patterns as it solves tasks, and will deterministically replay those learned trajectories whenever the task is encountered again, falling back to agent mode if edge cases are detected. Like a JIT compiler, for behaviors.

At Pig, we built computer-use agents for automating legacy Windows applications (healthcare, lending, manufacturing, etc).

A recurring theme we ran into was that businesses already had RPA (pure-software scripts), and it worked for them in most cases. The pull to agents as an RPA alternative was not to have an infinitely flexible "AI Employees" as tech Twitter/X may want you to think, but simply because their RPA breaks under occasional edge-cases and agents can gracefully handle those cases.

Using a pure-agent approach proved to be highly wasteful. Window's accessibility APIs are poor, so you're generally stuck using pure-vision agents, which can run around $40/hr in token costs and take 5x longer than a human to perform a workflow. At this point, you're better off hiring a human.

The goal of Muscle-Mem is to get LLMs out of the hot path of repetitive automations, intelligently swapping between script-based execution for repeat cases, and agent-based automations for discovery and self-healing.

While inspired by computer-use environments, Muscle Mem is designed to generalize to any automation performing discrete tasks in dynamic environments. It took a great deal of thought to figure out an API that generalizes, which I cover more deeply in this blog: https://erikdunteman.com/blog/muscle-mem/

Check out the repo, consider giving it a star, or dive deeper into the above blog. I look forward to your feedback!

Show context

dmos62 ◴[14 May 25 21:29 UTC] No.43989430[source]▶

>>43988381 (OP) #

I love the minimal approach and general-use focus.

If I understand correctly, the engine caches trajectories in the simplest way possible, so if you have a cached trajectory a-b-c, and you encounter c-b-d, there's no way to get a "partial" cache hit, right? As I'm wrapping my head around this, I'm thinking that the engine would have to be a great deal more complicated to be able to judge when it's a safe partial hit.

Basically, I'm trying to imagine how applicable this approach could be to a significantly noisier environment.

replies(1): >>43990257 #

edunteman ◴[14 May 25 23:28 UTC] No.43990257[source]▶

>>43989430 #

I struggled with this one for a while in the design, and didn't want to be hasty in making any decisions that lock us into a direction.

I definitely want to support sub-trajectories. In fact, I believe an absolutely killer feature for this system would be decomposing a large trajectory into smaller, more repeated sub-trajectories.

Jeff from trychroma.com often talks about agent engineering as being more like industrial engineering than software eng, and I'd agree.

One part of the original spec I wrote for this included a component I call the "Compactor", which would be a background agent process to modify and compress learned skills, is similar to Letta's sleep time agents:

https://docs.letta.com/guides/agents/sleep-time-agents

My fear with this is it goes against the `No hidden nondeterminism` design value I stated in the launch blog. There's plenty of things we can throw background agents at, from the Compactor to parameterizing trajectories, but that's risky territory from an observability and debugability stance.

For simplicity, I just decided treat every trajectory as distinct, even if portions of it are redundant. If a cached trajectory fails a check halfway through, the agent proceeding from there just makes its own partial trajectory. Still unclear if we call that a trajectory for the same named task, or if we annotated it as a task recovery.

We can always increase cache-hit rate over time, worst case is the agent just does redundant work which is the status quo anyway.

replies(1): >>43992383 #

dmos62 ◴[15 May 25 06:31 UTC] No.43992383[source]▶

>>43990257 #

It occurred to me that the cache could be indexed not only by environment state but also by intent. A second agent could subdivide trajectories into steps, upgrading trajectories into ordered lists of sub-trajectories. Each trajectory and list would have an intent attached and would be aware of parent list's (i.e. parent "super-trajectory's") intent. And therefore could be embedded and looked up by an agent given its own intent. Not sure if this train of thought is easy to follow.

That's more auto-magical than you might care for. I've been designing an IDE where you program with intent statements and the generated code is a second-class-citizen, so I might be biased in suggesting this.

replies(1): >>43992451 #

1. edunteman ◴[15 May 25 06:43 UTC] No.43992451[source]▶

>>43992383 #

totally follows! thanks for sharing, will noodle on it

↑