←back to thread

172 points anerli | 2 comments | | HN request time: 0.001s | source

Hey HN, Anders and Tom here - we’ve been building an end-to-end testing framework powered by visual LLM agents to replace traditional web testing.

We know there's a lot of noise about different browser agents. If you've tried any of them, you know they're slow, expensive, and inconsistent. That's why we built an agent specifically for running test cases and optimized it just for that:

- Pure vision instead of error prone "set-of-marks" system (the colorful boxes you see in browser-use for example)

- Use tiny VLM (Moondream) instead of OpenAI/Anthropic computer use for dramatically faster and cheaper execution

- Use two agents: one for planning and adapting test cases and one for executing them quickly and consistently.

The idea is the planner builds up a general plan which the executor runs. We can save this plan and re-run it with only the executor for quick, cheap, and consistent runs. When something goes wrong, it can kick back out to the planner agent and re-adjust the test.

It’s completely open source. Would love to have more people try it out and tell us how we can make it great.

Repo: https://github.com/magnitudedev/magnitude

Show context
tobr ◴[] No.43796478[source]
Interesting! My first concern is - isn’t this the ultimate non-deterministic test? In practice, does it seem flaky?
replies(1): >>43796637 #
anerli ◴[] No.43796637[source]
So the architecture is built with determinism in mind. The plan-caching system is still a work in progress, but especially once fully implemented it should be very consistent. As long as your interface doesn't change (or changes in trivial ways), Moondream alone can execute the same exact web actions as previous test runs without relying on any DOM selectors. When the interface does eventually change, that's where it becomes non-deterministic again by necessity, since the planner will need to generatively update the test and continue building the new cache from there. However once it's been adapted, it can once again be executed that way every time until the interface changes again.
replies(2): >>43796998 #>>43803404 #
1. engfan ◴[] No.43803404[source]
Anerli wrote: “When the interface does eventually change, that's where it becomes non-deterministic again by necessity, since the planner will need to generatively update the test and continue building the new cache from there.”

But what determines that the UI has changed for a specific URL? Your software independent of the planner LLM or do you require the visual LLM to make a determination of change?

You should also stop saying 100% open source when test plan generation and execution depend on non-open source AI components. It just doesn’t make sense.

replies(1): >>43804558 #
2. anerli ◴[] No.43804558[source]
The small VLM (Moondream) decides when interface changes / its actions no longer line up.

We say 100% open source because all of our code (test runner and AI agents) is completely open source. It’s also completely possible to run an entire OSS stack because you can configure with an open source planner LLM, and Moondream is open source. You could run it all locally even if you have solid hardware.