Skyvern Browser Agent 2.0: How We Reached State of the Art in Evals

Pre-planned steps by Planner will go wrong more often than not, as it will try to guess the UI layers from its memory/training data. Its better to just ask the "next step" by giving it current state of the UI.

I have built a similar project for mobile automation [1] and the validator phase is not separate rather it's inherently baked in each step since we only ask next step based on current UI and previous actions.

My Planner sometimes goes "Oh, we are still on home screen, let's find the Uber app icon". This sort of self-correcting behaviour was not programmed but the LLM does it on its own.

1. https://github.com/BandarLabs/ClickClickClick - A framework to automate mobile use via any LLM (local/remote)