←back to thread

98 points skull8888888 | 1 comments | | HN request time: 0s | source

Hey HN, Robert from Laminar (lmnr.ai) here.

We built Index - new SOTA Open Source browser agent.

It reached 92% on WebVoyager with Claude 3.7 (extended thinking). o1 was used as a judge, also we manually double checked the judge.

At the core is same old idea - run simple JS script in the browser to identify interactable elements -> draw bounding boxes around them on a screenshot of a browser window -> feed it to the LLM.

What made Index so good:

1. We essentially created browser agent observability. We patched Playwright to record the entire browser session while the agent operates, simultaneously tracing all agent steps and LLM calls. Then we synchronized everything in the UI, creating an unparalleled debugging experience. This allowed us to pinpoint exactly where the agent fails by seeing what it "sees" in session replay alongside execution traces.

2. Our detection script is simple but extremely good. It's carefully crafted via trial and error. We also employed CV and OCR.

3. Agent is very simple, literally just a while loop. All power comes from carefully crafted prompt and ton of eval runs.

Index is a simple python package. It also comes with a beautiful CLI.

pip install lmnr-index

playwright install chromium

index run

We've recently added o4-mini, Gemini 2.5 Pro and Flash. Pro is extremely good and fast. Give it a try via CLI.

You can also use index via serverless API. (https://docs.lmnr.ai/index-agent/api/getting-started)

Or via chat UI - https://lmnr.ai/chat.

To learn more about browser agent observability and evals check out open-source repo (https://github.com/lmnr-ai/lmnr) and our docs (https://docs.lmnr.ai/tracing/browser-agent-observability).

Show context
noleary ◴[] No.43777486[source]
> Index is the SOTA open-source browser agent for autonomously executing complex tasks on the web.

I've written a handful of pretty hacky Python scripts that just pull down all of the HTML content from a page and toss it over to OpenAI. As you can imagine, these were all extremely simple tasks, e.g., "find out if there's a login button"

What's a good example of a complex task that Index is well-suited for? What's the threshold of minimal complexity where you guys are a really good fit?

replies(1): >>43777534 #
skull8888888 ◴[] No.43777534[source]
- research task, agent is smart enough to understand which links to click next without the need to hardcode the parsing and navigation logic

- any task that requires UI interaction, button clicking, filter selection, form filling and so on. Just prompt it, it's surprisingly very robust and self-healing.

- complex long-running task that require extensive context - e.g. researching one topic and then creating spreadsheet, creating a presentation for a topic and so on.

Essentially, any task that can be done within a browser environment that previously required flacky hardcoded predefined scripts. Also, website testing is a great example.

replies(1): >>43778353 #
nico ◴[] No.43778353[source]
Would love to see it doing some work on a Google spreadsheet (including doing formulas, vlookups, data import and cleanup) and then creating a decent Slides presentation with some charts from the spreadsheet
replies(1): >>43780439 #
1. skull8888888 ◴[] No.43780439{3}[source]
it can do it! try it out, literally just prompt it