←back to thread

138 points parsabg | 1 comments | | HN request time: 0.205s | source

Hey HN,

I'm excited to share BrowserBee, a privacy-first AI assistant in your browser that allows you to run and automate tasks using your LLM of choice (currently supports Anthropic, OpenAI, Gemini, and Ollama). Short demo here: https://github.com/user-attachments/assets/209c7042-6d54-4fc...

Inspired by projects like Browser Use and Playwright MCP, its main advantage is the browser extension form factor which makes it more convenient for day to day use, especially for less technical users. Its also a bit less cumbersome to use on websites that require you to be logged in, as it attaches to the same browser instance you use (on privacy: the only data that leaves your browser is the communication with the LLM - there is no tracking or data collection of any sort).

Some of its core features are as follows:

- a memory feature which allows users to memorize common and useful pathways, making the next repetition of those tasks faster and cheaper

- real-time token counting and cost tracking (inspired by Cline)

- an approval flow for critical tasks such as posting content or making payments (also inspired by Cline)

- tab management allowing the agent to execute tasks across multiple tabs

- a range of browser tools for navigation, tab management, interactions, etc, which are broadly in line with Playwright MCP

I'm actively developing BrowserBee and would love to hear any thoughts, comments, or feedback.

Feel free to reach out via email: parsa.ghaffari [at] gmail [dot] com

-Parsa

Show context
dataviz1000 ◴[] No.44021468[source]
You might be able to reduce the amount of information sent to the LLM by 100 fold if you use a stacking context. Here is an example of one made available on Github (not mine). [0] Moreover, you will be able to parse the DOM or have strategies that parse the DOM. For example, if you are only concerned with video, find all the videos and only send that information. Perhaps parsing a page once finding the structure and caching that so the next time only the required data is used. (I see you are storing tool sequence but I didn't find an example of storing a DOM structure so that requests to subsequent pages are optimized.)

If someone visits my website that I control using your Chrome Extension, I will 100% be able to find a way to drain all their accounts probably in the background without them even knowing. Here are some ideas about how to mitigate that.

The problem with Playwright is that it requires Chrome DevTools Protocol (CDP) which opens massive security problems for a browser that people use for their banking and managing anything that involves credit cards are sensitive accounts. At one point, I took the injected folder out of Playwright and injected it into a Chrome Extension because I thought I needed its tools, however, I quickly abandoned it as it was easy to create workflows from scratch. You get a lot of stuff immediately by using Playwright but likely you will find it will be much lighter and safer to just implement that functionality by yourself.

The only benefit of CDP for normal use is allowing automation of any action in the Chrome Extension that requires trusted events, e.g. play sound, go fullscreen, banking websites what require trusted event to transfer money. I'm my opinion, people just want a large part of the workflow automated and don't mind being prompted to click a button when trusted events are required. Since it doesn't matter what button is clicked you can inject a big button that says continue or what is required after prompting the user. Trusted events are there for a reason.

[0] https://github.com/andreadev-it/stacking-contexts-inspector

replies(2): >>44021825 #>>44024048 #
kanzure ◴[] No.44024048[source]
possibly something like https://github.com/romansky/dom-to-semantic-markdown could also help for this use case.
replies(2): >>44024746 #>>44025192 #
1. dataviz1000 ◴[] No.44025192[source]
That is awesome. A list of power tools on Amazon went from 2.5MB of HTML to 236KB of markup. That is huge! Wow, thank you for sharing.

This is half the equation. Also, lot of the information in the markup can be used to query elements to interact with because it keeps the link locations which can be used to navigate or select elements. On the other hand, by using the stacking context, it is possible query only elements that are visible which removes all elements that can't be interacted with.