(github.com)

179 points anerli | 2 comments | 25 Apr 25 17:00 UTC | HN request time: 0.554s | source

Hey HN, Anders and Tom here - we’ve been building an end-to-end testing framework powered by visual LLM agents to replace traditional web testing.

We know there's a lot of noise about different browser agents. If you've tried any of them, you know they're slow, expensive, and inconsistent. That's why we built an agent specifically for running test cases and optimized it just for that:

- Pure vision instead of error prone "set-of-marks" system (the colorful boxes you see in browser-use for example)

- Use tiny VLM (Moondream) instead of OpenAI/Anthropic computer use for dramatically faster and cheaper execution

- Use two agents: one for planning and adapting test cases and one for executing them quickly and consistently.

The idea is the planner builds up a general plan which the executor runs. We can save this plan and re-run it with only the executor for quick, cheap, and consistent runs. When something goes wrong, it can kick back out to the planner agent and re-adjust the test.

It’s completely open source. Would love to have more people try it out and tell us how we can make it great.

Repo: https://github.com/magnitudedev/magnitude

1. aoeusnth1 ◴[26 Apr 25 03:29 UTC] No.43800684[source]▶

>>43796003 (OP) #

Why not make the strong model compile a non-ai-driven test execution plan using selectors / events? Is Moondream that good?

replies(1): >>43801395 #

2. anerli ◴[26 Apr 25 06:30 UTC] No.43801395[source]▶

>>43800684 (TP) #

Definitely a good question. Using an actual LLM as the execution layer allows us to more easily swap to the planner agent in the case that the test needs to be adapted. We don’t want to store just a selector based test because it’s difficult to determine when it requires adaptation, and is inherently more brittle to subtle UI changes. We think using a tiny model like Moondream makes this cheap enough that these benefits outweigh an approach where we cache actual playwright code.

↑

Show HN: Magnitude – open-source, AI-native test framework for web apps