←back to thread

226 points edunteman | 1 comments | | HN request time: 0.205s | source

Hi HN! Erik here from Pig.dev, and today I'd like to share a new project we've just open sourced:

Muscle Mem is an SDK that records your agent's tool-calling patterns as it solves tasks, and will deterministically replay those learned trajectories whenever the task is encountered again, falling back to agent mode if edge cases are detected. Like a JIT compiler, for behaviors.

At Pig, we built computer-use agents for automating legacy Windows applications (healthcare, lending, manufacturing, etc).

A recurring theme we ran into was that businesses already had RPA (pure-software scripts), and it worked for them in most cases. The pull to agents as an RPA alternative was not to have an infinitely flexible "AI Employees" as tech Twitter/X may want you to think, but simply because their RPA breaks under occasional edge-cases and agents can gracefully handle those cases.

Using a pure-agent approach proved to be highly wasteful. Window's accessibility APIs are poor, so you're generally stuck using pure-vision agents, which can run around $40/hr in token costs and take 5x longer than a human to perform a workflow. At this point, you're better off hiring a human.

The goal of Muscle-Mem is to get LLMs out of the hot path of repetitive automations, intelligently swapping between script-based execution for repeat cases, and agent-based automations for discovery and self-healing.

While inspired by computer-use environments, Muscle Mem is designed to generalize to any automation performing discrete tasks in dynamic environments. It took a great deal of thought to figure out an API that generalizes, which I cover more deeply in this blog: https://erikdunteman.com/blog/muscle-mem/

Check out the repo, consider giving it a star, or dive deeper into the above blog. I look forward to your feedback!

1. parsabg ◴[] No.43992832[source]
I've been thinking about this (funnily, also while building a browser use type agent [1]) and I think this is a solid direction to explore. My implementation stores tuples of (context, task, tool_sequence) after a successful task completion, e.g.: (Instagram.com, "check the user's notifications", [browser_click, browser_read_text, ...]).

One can imagine an agent-to-agent marketplace where agents publish and consume such memories, standardized by canonical references to the MCP tools they've used, and possibly put a price on it based on how much work it would take another agent to "discover" that useful computational path. Then the consuming agents can make a "build vs buy" decision.

The core issue is in creating meaningful notions of "context" across the universe of tasks and environments. I'm skeptical of embeddings for that reason, and I think reducing false positives/negatives for cache hits is more important than efficiency in the short term, so perhaps a rich textual description of the context is a good short term compromise.

[1] https://github.com/parsaghaffari/browserbee