←back to thread

319 points fbouvier | 10 comments | | HN request time: 0.443s | source | bottom

We’re Francis and Pierre, and we're excited to share Lightpanda (https://lightpanda.io), an open-source headless browser we’ve been building for the past 2 years from scratch in Zig (not dependent on Chromium or Firefox). It’s a faster and lighter alternative for headless operations without any graphical rendering.

Why start over? We’ve worked a lot with Chrome headless at our previous company, scraping millions of web pages per day. While it’s powerful, it’s also heavy on CPU and memory usage. For scraping at scale, building AI agents, or automating websites, the overheads are high. So we asked ourselves: what if we built a browser that only did what’s absolutely necessary for headless automation?

Our browser is made of the following main components:

- an HTTP loader

- an HTML parser and DOM tree (based on Netsurf libs)

- a Javascript runtime (v8)

- partial web APIs support (currently DOM and XHR/Fetch)

- and a CDP (Chrome Debug Protocol) server to allow plug & play connection with existing scripts (Puppeteer, Playwright, etc).

The main idea is to avoid any graphical rendering and just work with data manipulation, which in our experience covers a wide range of headless use cases (excluding some, like screenshot generation).

In our current test case Lightpanda is roughly 10x faster than Chrome headless while using 10x less memory.

It's a work in progress, there are hundreds of Web APIs, and for now we just support some of them. It's a beta version, so expect most websites to fail or crash. The plan is to increase coverage over time.

We chose Zig for its seamless integration with C libs and its comptime feature that allow us to generate bi-directional Native to JS APIs (see our zig-js-runtime lib https://github.com/lightpanda-io/zig-js-runtime). And of course for its performance :)

As a company, our business model is based on a Managed Cloud, browser as a service. Currently, this is primarily powered by Chrome, but as we integrate more web APIs it will gradually transition to Lightpanda.

We would love to hear your thoughts and feedback. Where should we focus our efforts next to support your use cases?

Show context
fbouvier ◴[] No.42812928[source]
Author here. The browser is made from scratch (not based on Chromium/Webkit), in Zig, using v8 as a JS engine.

Our idea is to build a lightweight browser optimized for AI use cases like LLM training and agent workflows. And more generally any type of web automation.

It's a work in progress, there are hundreds of Web APIs, and for now we just support some of them (DOM, XHR, Fetch). So expect most websites to fail or crash. The plan is to increase coverage over time.

Happy to answer any questions.

replies(12): >>42814673 #>>42814724 #>>42814729 #>>42814730 #>>42815546 #>>42815780 #>>42815872 #>>42817053 #>>42818490 #>>42818771 #>>42820506 #>>42863671 #
bityard ◴[] No.42815546[source]
Please put a priority on making it hard to abuse the web with your tool.

At a _bare_ minimum, that means obeying robot.txt and NOT crawling a site that doesn't want to be crawled. And there should not be an option to override that. It goes without saying that you should not allow users to make hundreds or thousands of "blind" parallel requests as these tend to have the effect of DoSing sites that are being hosted on modest hardware. You should also be measuring response times and throttling your requests accordingly. If a website issues a response code or other signal that you are hitting it too fast or too often, slow down.

I say this because since around the start of the new year, AI bots have been ravaging what's left of the open web and causing REAL stress and problems for admins of small and mid-sized websites and their human visitors: https://www.heise.de/en/news/AI-bots-paralyze-Linux-news-sit...

replies(10): >>42815962 #>>42817710 #>>42818942 #>>42819586 #>>42819872 #>>42821560 #>>42822191 #>>42822965 #>>42823085 #>>42824195 #
gkbrk ◴[] No.42815962[source]
Please don't.

Software I installed on my computer needs to the what I want as the user. I don't want every random thing I install to come with DRM.

The project looks useful, and if it ends up getting popular I imagine someone would make a DRM-free version anyway.

replies(4): >>42816001 #>>42816195 #>>42822351 #>>42822787 #
1. tossandthrow ◴[] No.42816001[source]
Where do you read DRM?

Parent commenter merely and humbly asks the author of the library to make sure that it has sane defaults and support for ethical crawling.

I find it disturbing that you would recommend against that.

replies(2): >>42816030 #>>42821175 #
2. gkbrk ◴[] No.42816030[source]
Here's what the parent comment wrote.

> And there should not be an option to override that.

This is not just a sane default. This is software telling you what you are allowed to do based on what the rights owner wants, literally DRM.

This is exactly like Android not allowing screenshots to be taken in certain apps because the rights owner didn't allow it.

replies(2): >>42816247 #>>42816851 #
3. blacksmith_tb ◴[] No.42816247[source]
Not sure what "digital rights" that "manages"? I don't see it as an unreasonable suggestion that the tool shouldn't be set up out of the box to DoS sites it's scraping, that doesn't prevent anyone who is technical enough to know what they're doing to fork it and remove whatever limits are there by default? I can't see it as a "my computer should do what I want!" issue, if you don't like how this package works, change it or use another?
replies(1): >>42817655 #
4. JambalayaJimbo ◴[] No.42816851[source]
This is software telling you what you are allowed to do based on what the software developer wants* (assuming the developers cares of course...). Which is how all software works. I would not want my users of my software doing anything malicious with it, so I would not give them the option.

If I create an open-source messaging app I am also not going to give users the option of clicking a button to spam recipients with dick pics. Even if it was dead-simple for a determined user to add code for this dick pic button themselves.

5. benatkin ◴[] No.42817655{3}[source]
Digital Restrictions Management, then. Have it your way.
replies(1): >>42821168 #
6. concerndc1tizen ◴[] No.42821168{4}[source]
There are so many combative people on HackerNews lately who insist to misinterpret everything.

I really wonder if it's bots or just assholes.

replies(1): >>42821462 #
7. concerndc1tizen ◴[] No.42821175[source]
> I find it disturbing

Oh no, someone on the internet found something offensive!

replies(1): >>42822947 #
8. cies ◴[] No.42821462{5}[source]
Indeed DRM is a very different thing from adhering to standards like `robots.txt` as a default out of the box (there could still be a documented option to ignore it).
replies(1): >>42821470 #
9. concerndc1tizen ◴[] No.42821470{6}[source]
- That's just like, your opinion, man

He was using DRM as a metaphor for restricted software. And advocating that software should do whatever the user wants. If the user is ignorant about the harm the software does, then adding robots.txt support is win-win for all. But if the user doesn't want it, then it's political, in the same way that DRM is political and anti-user.

10. tossandthrow ◴[] No.42822947[source]
Disturbing, not offensive - it is literally right there in the quote you have been so nice to pass along.