-- Why bother building a new browser? For the first time since Netscape was released in 1994, it feels like we can reimagine browsers from scratch for the age of AI agents. The web browser of tomorrow might not look like what we have today.
We saw how tools like Cursor gave developers a 10x productivity boost, yet the browser—where everyone else spends their entire workday—hasn't fundamentally changed.
And honestly, we feel like we're constantly fighting the browser we use every day. It's not one big thing, but a series of small, constant frustrations. I'll have 70+ tabs open from three different projects and completely lose my train of thought. And simple stuff like reordering tide pods from amazon or filling out forms shouldn't need our full attention anymore. AI can handle all of this, and that's exactly what we're building.
Here’s a demo of our early version https://dub.sh/nxtscape-demo
-- What makes us different We know others are exploring this space (Perplexity, Dia), but we want to build something open-source and community-driven. We're not a search or ads company, so we can focus on being privacy-first – Ollama integration, BYOK (Bring Your Own Keys), ad-blocker.
Btw we love what Brave started and stood for, but they've now spread themselves too thin across crypto, search, etc. We are laser-focused on one thing: making browsers work for YOU with AI. And unlike Arc (which we loved too but got abandoned), we're 100% open source. Fork us if you don't like our direction.
-- Our journey hacking a new browser To build this, we had to fork Chromium. Honestly, it feels like the only viable path today—we've seen others like Brave (started with electron) and Microsoft Edge learn this the hard way.
We also started with why not just build an extension. But realized we needed more control. Similar to the reason why Cursor forked VSCode. For example, Chrome has this thing called the Accessibility Tree - basically a cleaner, semantic version of the DOM that screen readers use. Perfect for AI agents to understand pages, but you can't use it through extension APIs.
That said, working with the 15M-line C++ chromium codebase has been an adventure. We've both worked on infra at Google and Meta, but Chromium is a different beast. Tools like Cursor's indexing completely break at this scale, so we've had to get really good with grep and vim. And the build times are brutal—even with our maxed-out M4 Max MacBook, a full build takes about 3 hours.
Full disclosure: we are still very early, but we have a working prototype on GitHub. It includes an early version of a "local Manus" style agent that can automate simple web tasks, plus an AI sidebar for questions, and other productivity features (grouping tabs, saving/resuming sessions, etc.).
Looking forward to any and all comments!
You can download the browser from our github page: https://github.com/nxtscape/nxtscape
If your browser behaves, it's not going to be excluded in robots.txt.
If your browser doesn't behave, you should at least respect robots.txt.
If your browser doesn't behave, and you continue to ignore robots.txt, that's just... shitty.
Maybe some new standards and maybe a user configurable per site permissions may make it better?
I'm curious to see how this will turn out to be.
Website operators should not get a say in what kinds of user agents I used to access their sites. Terminal? Fine. Regular web browser? Okay. AI powered web browser? Who cares. The strength of the web lies in the fact that I can access it with many different kinds of tools depending on my use case, and we cannot sacrifice that strength on the altar of hatred of AI tools.
Down that road lies disaster, with the Play Integrity API being just the tip of the iceberg.
Why? My user agent is configured to make things easier for me and allow me to access content that I wouldn't otherwise choose to access. Dark mode allows me to read late at night. Reader mode allows me to read content that would otherwise be unbearably cluttered. I can zoom in on small text to better see it.
Should my reader mode or dark mode or zoom feature have to respect robots.txt because otherwise they'd allow me to access content that I would otherwise have chosen to leave alone?
I know its not completely true, I know reader mode can help you bypass the ads _after_ you already had a peek at the cluttered version, but if you need to go to the next page or something like that you need to disable reader-mode once and so on, so its a very granular ad-blocking while many AI use cases are about bypassing viewing it at all by a human; and the other thing is that reader mode is not very popular so its not a significant threat.
*or other links on their websites, or informative banners, etc
What about reader mode that is auto-configured to turn on immediately on landing on specific domains? Is that a robot for the purposes of robots.txt?
https://addons.mozilla.org/en-US/firefox/addon/automatic-rea...
And also, just to confirm, I'm to understand that if I'm navigating the internet with an ad blocker then you believe that I should respect robots.txt because my user agent is now a robot by virtue of using an ad blocker?
Is that also true if I browse with a terminal-based browser that simply doesn't render JavaScript or images?
If any type of AI based assistance is supposed to adhere to the robot.txt, then would you also say that AI based accessibility tools should refuse to work on pages blocked by robot.txt?
> A robot is a program that automatically traverses the Web's hypertext structure by retrieving a document, and recursively retrieving all documents that are referenced.
This is absolutely not what you are doing, which means what you have here is not a robot. What you have here is a user agent, so you don't need to pay attention to robots.txt.
If what you are doing here counted as robotic traffic, then so would:
* Speculative loading (algorithm guesses what you're going to load next and grabs it for you in advance for faster load times).
* Reader mode (algorithm transforms the website to strip out tons of content that you don't want and present you only with the minimum set of content you wanted to read).
* Terminal-based browsers (do not render images or JavaScript, thus bypassing advertising and according to some justifications leading them to be considered a robot because they bypass monetization).
The fact is that the web is designed to be navigated by a diverse array of different user agents that behave differently. I'd seriously consider imposing rate limits on how frequently your browser acts so you don't knock over a server—that's just good citizenship—but robots.txt is not designed for you and if we act like it is then a lot of dominoes will fall.
Autoconfig of reader mode and so on its so uncommon that is not even in the radar of most websites, if it was browser developers probably would try to create a solution that satisfies both parties, like putting the ads at the end and required to be text-only and other guidelines, but its not popular, same thing happens with terminal-based browsers, a lot of the most visited websites in the world don't even work without JS enabled.
On the other hand, this AI stuff seems to envision a larger userbase so it could become a concern and therefore the role of robots.txt or other anti-bot features could have some practical connotations.
No, it's common practice to allow Googlebot and deny all other crawlers by default [0].
This is within their rights when it comes to true scrapers, but it's part of why I'm very uncomfortable with the idea of applying robots.txt to what are clearly user agents. It sets a precedent where it's not inconceivable that we have websites curating allowlists of user agents like they already do for scrapers, which would be very bad for the web.
[0] As just one example: https://www.404media.co/google-is-the-only-search-engine-tha...
I'm not asking if you believe ad blocking is ethical, I got that you don't. I'm asking if it turns my browser into a scraper that should be treated as such, which is an orthogonal question to the ethics of the tool in the first place.
I strongly disagree that user agents of the sort shown in the demo should count as robots. Robots.txt is designed for bots that produce tons of traffic to discourage them from hitting expensive endpoints (or to politely ask them to not scrape at all). I've responded to incidents caused by scraper traffic and this tool will never produce traffic in the same order of magnitude as a problematic scraper.
If we count this as a robot for the purposes of robots.txt we're heading down a path that will end the user agent freedom we've hitherto enjoyed. I cannot endorse that path.
For me the line is simple, and it's the one defined by robotstxt.org [0]: "A robot is a program that automatically traverses the Web's hypertext structure by retrieving a document, and recursively retrieving all documents that are referenced. ... Normal Web browsers are not robots, because they are operated by a human, and don't automatically retrieve referenced documents (other than inline images)."
If the user agent is acting on my instructions and accessing a specific and limited subset of the site that I asked it to, it's not a web scraper and should not be treated as such. The defining feature of a robot is amount of traffic produced, not what my user agent does with the information it pulls.
AFAIK this is false, and this browser can do things like "summarize all the cooking recipes linked in this page" and therefore act exactly like a scraper (even if at smaller scale than most scrapers)
If tomorrow magically all phones and all computers had an ad-blocking browser installed -and set as the default browser- a big chunk of the economy would collapse, so while I can see the philosophical value of "What a user does with a page after it has entered their browser is their own prerogative", the pragmatic in me knows that if all users cared about that and enforced it it would have grave repercussions in the livelihood of many.
> A robot is a program that automatically traverses the Web's hypertext structure by retrieving a document, and recursively retrieving all documents that are referenced.
There's nothing recursive about "summarize all the cooking recipes linked on this page". That's a single-level iterative loop.
I will grant that I should alter my original statement: if OP wanted to respect robots.txt when it receives a request that should be interpreted as an instruction to recursively fetch pages, then I'd think that's an appropriate use of robots.txt, because that's not materially different than implementing a web crawler by hand in code.
But that represents a tiny subset of the queries that will go through a tool like this and respecting robots.txt for non-recursive requests would lead to silly outcomes like the browser refusing to load reddit.com [0].
I am not sure I agree with an AI-aided browser, that will scrape sites and aggregate that information, being classified as "clearly" a user agent.
If this browser were to gain traction and ends up being abusive to the web, that's bad too.
Where do you draw the line of crawler vs. automated "user agent"? Is it a certain number of web requests per minute? How are you defining "true scraper"?
> A robot is a program that automatically traverses the Web's hypertext structure by retrieving a document, and recursively retrieving all documents that are referenced.
To me "recursive" is key—it transforms the traffic pattern from one that strongly resembles that of a human to one that touches every page on the site, breaks caching by visiting pages humans wouldn't typically, and produces not just a little bit more but orders of magnitude more traffic.
I was persuaded in another subthread that Nxtscape should respect robots.txt if a user issues a recursive request. I don't think it should if the request is "open these 5 subreddits and summarize the most popular links uploaded since yesterday", because the resulting traffic pattern is nearly identical to what I'd have done by hand (especially if the browser implements proper rate limiting, which I believe it should).
As a user, the browser is my agent. If I'm directing an LLM to do something on a page in my browser, it's not that much different than me clicking a button manually, or someone using a screen reader to read the text on a page. The browser is my user agent and the specific tools I choose to use in my browser shouldn't be forbidden by a webpage. (that's why to this day all browsers still claim to be Mozilla...)
(This is very different than mass scraping web pages for training purposes. Those should absolutely respect robots.txt. There's a big difference between a user operated agentic-browser interacting with a web page and mass link crawling.)
No meatsack in the loop making decisions and pushing the button? Robots.txt applies.