Project Vend: Can Claude run a small shop? (And why does that matter?)

This is just like the pokemon experiement: putting next token models that were never trianed to be agents in space, as agents in space. And its failing the same ways

Barring halicinations, all of the fialures are related to reinforcement learning. It cant keep its optimization function in mind long enough to not maximize revenue and minimize cost. It cant keep state in mind well enough to manage inventory, or gauge that its losing money.

And the things anthropic is prescribing is falling right into the bitter lesson. More tooling and scaffolding? A CRM? All thats doing is putting explicit rulesets to guide the model. Of course that shows results in the short term, but it will never unlock a new evolution of AI, which managing a store or playing pokemon would need.

This is a great experiment, the right takeaway from this is that a new type of base model is needed, with a different base objective than the next word/sentence prediction of LLMs. I dont know what that model will look like but it needs to be able to handle dynamic environments rather than static. It needs to have a space state and an object. It basically needs to have reinforcement learning at its very foundation level, rather than applied on top of the base model like current agents are