A few months ago, we launched Browser Use (https://news.ycombinator.com/item?id=43173378), which let LLMs perform tasks in the browser using natural language prompts. It was great for one-off tasks like booking flights or finding products—but we soon realized enterprises have somewhat different needs:
They typically have one workflow with dynamic variables (e.g., filling out a form and downloading a PDF) that they want to reliably run a million times without breaking. Pure LLM agents were slow, expensive, and unpredictable for these high-frequency tasks.
So we just started working on Workflow Use:
- You show the browser what to do (by manually recording steps; show don’t tell).
- An LLM converts these recordings into deterministic scripts with variables (scripts include AI steps as well, where it’s 100% agentic)
- Scripts run reliably, 10x faster, and ~90% cheaper than Browser Use.
- If a step breaks, workflow will fallback to Browser Use and agentically run the step. (This self-healing functionality is still very early.)
This project just kicked off, so lots of things will break, it’s definitely not production-ready yet, and plenty of stuff is still missing (like a solid editor and proper self-healing). But we wanted to share early, get feedback, and figure out what workflows you’d want to automate this way.
Try it out and let us know what you think!
2) yeah good question. The end goal is to completely regenerate the flow if it breaks (let browser use explore the “new” website and update the original flow). But let’s see, soo much could be done here!
What did you work on btw?
2. We are working on https://www.launchskylight.com/ , agentic QA. For the self onboarding version we are using pure CUA without caching. (We wanted to avoid playwright to make it more flexible for canvas+iframe based apps,where we found HTML based approaches like browser-use limited, and to support desktop apps in the future).
We are betaing caching internally for customers, and releasing it for the self-onboarding soon. We use CUA actions for caching instead of playwright. Caching with pixel native models is def a bit more brittle for clicks and we focus on purely vision based analysis to decide to proceed or not. I think for scaling though you are 100% right, screenshots every step for validating are okay/worth it, but running an agent non-deterministically for actions is def an overkill for enterprise, that was what we found as well.
Geminis video understanding is also an interesting way to analyze what went wrong in more interactive apps. Apart from that i think we share quite a bit of the core thinking, would be interested to chat, will DM!
Really great to see the fallback to the agentic run when the automation breaks. For our e2e testing browser automation at Donobu, we independently arrived at the same pattern and have been impressed with how well it works. Automatic self-healed PR example here: https://github.com/donobu-inc/playwright-flows/pull/6/files
edit: typo
There’s a lot of websites that are super hostile to automation and make it really hard to do simple, small, but repetitive stuff with things like playwright, selenium, chromedriver
It's true that many are looking into self-healing for existing automation scripts; from what I've seen, tools like Healenium are gaining some traction in this space. However, I agree that a Browser Use-like approach also holds a lot of promise here.
My thinking on how this could be achieved with AI agents like Browser Use is to run the existing automation scripts as usual. If a script breaks due to an "element not found" exception or similar issues, the AI agent could then be triggered to analyze the page, identify the correct new locator for the problematic element, and dynamically update or "heal" the script. I've actually put together a small proof-of-concept demonstrating this idea using Browser Use: https://www.loom.com/share/1af87d78d6814512b17a8f949c28ef13?...
I had explored a similar concept previously with Lavague setup here: https://www.loom.com/share/9b0c7cf0bdd6492f885a2c974ca8a4be?...
Another avenue, particularly relevant for existing test suites, is how many QA teams manage their locators. Often, these are centralized in files like POM.xml (for Java/Maven projects) or external spreadsheets/CSVs. An AI agent could potentially be used to proactively scan the application and update these locator repositories.
For instance,
I've experimented with a workflow where Browser Use updates a CSV file of locators weekly based on changes detected on the website: https://www.loom.com/share/821f80fcb0694be4bd4d979e94900990?...
Excited to see how Workflow Use evolves, especially the self-healing aspects!
One area for improve seems to be the checkboxes/radio buttons (possibility other input types?) given the demo didn't pick up on that (it did make the same selections but it didn't recognize this was a multiple-choice input). It might be useful to have a step before creating the JSON where it asks the user some follow up questions like, "Here are the inputs I found and my understanding of their data type". And then go through each input asking for a default value and maybe even clarification on "Should we even prompt for this?" (Example, always select country X).
I wonder if, for workflow repair purposes, it would be helpful to, at recording time, save more contextual information about the fields you are filling/clicking on. "This a country selector", "This is the birthdate field", etc. So that if the xpath/css/etc fails you can give the LLM doing the repair work a description of what it's looking for.
I'm excited to see more efforts in QA testing with things like this. Brittle e2e tests are the bane of my (limited) automated experience and the ability to auto-heal and/or deal with minor deviations would be wonderful.