←back to thread

319 points fbouvier | 3 comments | | HN request time: 0.612s | source

We’re Francis and Pierre, and we're excited to share Lightpanda (https://lightpanda.io), an open-source headless browser we’ve been building for the past 2 years from scratch in Zig (not dependent on Chromium or Firefox). It’s a faster and lighter alternative for headless operations without any graphical rendering.

Why start over? We’ve worked a lot with Chrome headless at our previous company, scraping millions of web pages per day. While it’s powerful, it’s also heavy on CPU and memory usage. For scraping at scale, building AI agents, or automating websites, the overheads are high. So we asked ourselves: what if we built a browser that only did what’s absolutely necessary for headless automation?

Our browser is made of the following main components:

- an HTTP loader

- an HTML parser and DOM tree (based on Netsurf libs)

- a Javascript runtime (v8)

- partial web APIs support (currently DOM and XHR/Fetch)

- and a CDP (Chrome Debug Protocol) server to allow plug & play connection with existing scripts (Puppeteer, Playwright, etc).

The main idea is to avoid any graphical rendering and just work with data manipulation, which in our experience covers a wide range of headless use cases (excluding some, like screenshot generation).

In our current test case Lightpanda is roughly 10x faster than Chrome headless while using 10x less memory.

It's a work in progress, there are hundreds of Web APIs, and for now we just support some of them. It's a beta version, so expect most websites to fail or crash. The plan is to increase coverage over time.

We chose Zig for its seamless integration with C libs and its comptime feature that allow us to generate bi-directional Native to JS APIs (see our zig-js-runtime lib https://github.com/lightpanda-io/zig-js-runtime). And of course for its performance :)

As a company, our business model is based on a Managed Cloud, browser as a service. Currently, this is primarily powered by Chrome, but as we integrate more web APIs it will gradually transition to Lightpanda.

We would love to hear your thoughts and feedback. Where should we focus our efforts next to support your use cases?

Show context
m3kw9 ◴[] No.42814620[source]
How does this work because the browser needs to render a page and the vision model needs to know where a button is, so it still needs to see an image. How does headless make it easier?
replies(1): >>42814714 #
katiehallett ◴[] No.42814714[source]
Headless mode skips the visual rendering meant for humans, but the DOM structure and layout still exist, allowing the model to parse elements programmatically (e.g. button locations). Instead of 'seeing' an image, the model interacts with the page's underlying structure, which is faster and more efficient. Our browser removes the rendering engine as well, so it won't handle 100% of automation use cases, but it's also what allows us to be faster and lighter than Chrome in headless mode.
replies(2): >>42815232 #>>42816611 #
10000truths ◴[] No.42816611[source]
The issue is that DOM structure does not correspond one-to-one with perceived structure. I could render things in the DOM that aren't visible to people (e.g. a transparent 5px x 5px button), or render things to people that aren't visible in the DOM (e.g. Facebook's DOM obfuscation shenanigans to evade ad-blocking, or rendering custom text to a WebGL canvas). Sure, most websites don't go that far, but most websites also aren't valuable targets for automated crawling/scraping. These kinds of disparities will be exploited to detect and block automated agents if browser automation becomes sufficiently popular, and then we're back to needing to render the whole browser and operate on the rendered image to keep ahead of the arms race.
replies(1): >>42820627 #
1. emporas ◴[] No.42820627[source]
Servers operate on top of tcp/ip not to serve information, rather to serve information plus something else, usually ads. This is usually implemented with websites and captchas n stuff.

That's a problem of misaligned economic incentives. If there is a blockchain which enables micro-transactions of 0.000001 cent per request, and in the order of a million tps or a billion tps, then servers have no reason not to accept money in exchange for information, instead of using ads to extract some eyeball attention.

There is no reason that i cannot invoke a command line program: `$fetch_social_media_posts -n 1000` and get the last thousand posts right there in the console, as long as i provide some valid transactions to the server.

Websites and ads are the wrong solution to the problem of gaining something while serving information, and headless browsers and scraping are the wrong solution to the first wrong solution and the problems it creates.

replies(1): >>42822223 #
2. aniviacat ◴[] No.42822223[source]
No need for blockchain, microtransaction functionality should be integrated into our existing payment methods.
replies(1): >>42823833 #
3. emporas ◴[] No.42823833[source]
Existing payment methods, paypal, google pay etc, have been absolutely crucial for internet payments, but the micro in the word never ends.

If there are internet payments with a minimum payment of 1 cent, then we need payments of 0.1 cents. If that's achieved, then we need 0.01 cents minimum transaction. The micro in the transaction always needs to be smaller (and faster).

Free competition (or perfect competition) over a well defined landscape, internet protocols that is, has proven to always deliver better quality goods and lower price. Money derived from governments is far, far from free competition, let alone well defined internet protocols, and there is a point in which existing payment methods get stuck and cannot deliver smaller transactions.

I don't personally know where and when that point is, but if i have to guess, existing payment methods have reached that minimum point for at least a decade. In other words, their transactions minimums have to be high enough for them to make a profit. Yes, they can implement microtransactions, but they will not be profitable.