Show HN: Magnitude – open-source, AI-native test framework for web apps

1. grbsh ◴[25 Apr 25 17:15 UTC] No.43796164[source]▶

I know moondream is cheap / fast and can run locally, but is it good enough? In my experience testing things like Computer Use, anything but the large LLMs has been so unreliable as to be unworkable. But maybe you guys are doing something special to make it work well in concert?

replies(1): >>43796265 #

2. anerli ◴[25 Apr 25 17:23 UTC] No.43796265[source]▶

>>43796164 (TP) #

So it's key to still have a big model that is devising the overall strategy for executing the test case. Moondream on its own is pretty limited and can't handle complex queries. The planner gives very specific instructions to Moondream, which is just responsible for locating different targets on the screen. It's basically just the layer between the big LLM doing the actual "thinking" and grounding that to specific UI interactions.

Where it gets interesting, is that we can save the execution plan that the big model comes up with and run with ONLY Moondream if the plan is specific enough. Then switch back out to the big model if some action path requires adjustment. This means we can run repeated tests much more efficiently and consistently.

replies(1): >>43796326 #

3. grbsh ◴[25 Apr 25 17:28 UTC] No.43796326[source]▶

>>43796265 #

Ooh, I really like the idea about deciding whether to use the big or small model based on task specificity.

replies(1): >>43796341 #

4. tough ◴[25 Apr 25 17:29 UTC] No.43796341{3}[source]▶

>>43796326 #

You might like https://pypi.org/project/llm-predictive-router/

replies(1): >>43796510 #

5. anerli ◴[25 Apr 25 17:44 UTC] No.43796510{4}[source]▶

>>43796341 #

Oh this is interesting. In our case we are being very specific about which types of prompts go where, so the planner essentially creates prompts that will be executed by Moondream, instead of trying to route prompts generally to the appropriate model. The types of requests that our planner agent vs Moondream can handle are fundamentally different for our use case.

replies(1): >>43796610 #

6. tough ◴[25 Apr 25 17:52 UTC] No.43796610{5}[source]▶

>>43796510 #

interesting, will check out yours i'm mostly interested on these dynamic routers so I can mix local and API based depending on needs, i cannot run some models locally but most of the tasks don't even require such power (on building ai agentic systems)

there's also https://github.com/lm-sys/RouteLLM

and other similar

I guess your system is not as open-ended task oriented so you can just build workflows deciding which model to use at each step, these routing mechanisms are more useful for open-ended tasks that dont fit on a workflow so well (maybe?)