←back to thread

319 points fbouvier | 3 comments | | HN request time: 1.228s | source

We’re Francis and Pierre, and we're excited to share Lightpanda (https://lightpanda.io), an open-source headless browser we’ve been building for the past 2 years from scratch in Zig (not dependent on Chromium or Firefox). It’s a faster and lighter alternative for headless operations without any graphical rendering.

Why start over? We’ve worked a lot with Chrome headless at our previous company, scraping millions of web pages per day. While it’s powerful, it’s also heavy on CPU and memory usage. For scraping at scale, building AI agents, or automating websites, the overheads are high. So we asked ourselves: what if we built a browser that only did what’s absolutely necessary for headless automation?

Our browser is made of the following main components:

- an HTTP loader

- an HTML parser and DOM tree (based on Netsurf libs)

- a Javascript runtime (v8)

- partial web APIs support (currently DOM and XHR/Fetch)

- and a CDP (Chrome Debug Protocol) server to allow plug & play connection with existing scripts (Puppeteer, Playwright, etc).

The main idea is to avoid any graphical rendering and just work with data manipulation, which in our experience covers a wide range of headless use cases (excluding some, like screenshot generation).

In our current test case Lightpanda is roughly 10x faster than Chrome headless while using 10x less memory.

It's a work in progress, there are hundreds of Web APIs, and for now we just support some of them. It's a beta version, so expect most websites to fail or crash. The plan is to increase coverage over time.

We chose Zig for its seamless integration with C libs and its comptime feature that allow us to generate bi-directional Native to JS APIs (see our zig-js-runtime lib https://github.com/lightpanda-io/zig-js-runtime). And of course for its performance :)

As a company, our business model is based on a Managed Cloud, browser as a service. Currently, this is primarily powered by Chrome, but as we integrate more web APIs it will gradually transition to Lightpanda.

We would love to hear your thoughts and feedback. Where should we focus our efforts next to support your use cases?

Show context
fbouvier ◴[] No.42812928[source]
Author here. The browser is made from scratch (not based on Chromium/Webkit), in Zig, using v8 as a JS engine.

Our idea is to build a lightweight browser optimized for AI use cases like LLM training and agent workflows. And more generally any type of web automation.

It's a work in progress, there are hundreds of Web APIs, and for now we just support some of them (DOM, XHR, Fetch). So expect most websites to fail or crash. The plan is to increase coverage over time.

Happy to answer any questions.

replies(12): >>42814673 #>>42814724 #>>42814729 #>>42814730 #>>42815546 #>>42815780 #>>42815872 #>>42817053 #>>42818490 #>>42818771 #>>42820506 #>>42863671 #
bityard ◴[] No.42815546[source]
Please put a priority on making it hard to abuse the web with your tool.

At a _bare_ minimum, that means obeying robot.txt and NOT crawling a site that doesn't want to be crawled. And there should not be an option to override that. It goes without saying that you should not allow users to make hundreds or thousands of "blind" parallel requests as these tend to have the effect of DoSing sites that are being hosted on modest hardware. You should also be measuring response times and throttling your requests accordingly. If a website issues a response code or other signal that you are hitting it too fast or too often, slow down.

I say this because since around the start of the new year, AI bots have been ravaging what's left of the open web and causing REAL stress and problems for admins of small and mid-sized websites and their human visitors: https://www.heise.de/en/news/AI-bots-paralyze-Linux-news-sit...

replies(10): >>42815962 #>>42817710 #>>42818942 #>>42819586 #>>42819872 #>>42821560 #>>42822191 #>>42822965 #>>42823085 #>>42824195 #
1. cchance ◴[] No.42818942[source]
Its literally open source, any effort put into hamstringing it would just be forked and removed lol
replies(1): >>42819438 #
2. xena ◴[] No.42819438[source]
Any barrier to abuse makes abuse harder.
replies(1): >>42826892 #
3. cchance ◴[] No.42826892[source]
Not really lol, literally if you add a robots.txt check someone can just create a fork repo with a git action that escapes that routine every time the original is pushed... Adding options for filter and respecting things is great even as defaults, but trying to force "good behavior" tends to just lead to people setting up a workaround that everyone eventually uses because why use the hamstrung version instead of the open version and make your own choices.