←back to thread

20 points simonw | 1 comments | | HN request time: 0.001s | source
Show context
minusf ◴[] No.46266164[source]
while it's mentioned in the post, it seems to me a bit burried:

isn't this more like a port of `html5ever` from rust to python using LLM, as opposed to creating something "new" based on the test suite alone?

if yes, wouldn't be the distinction rather important?

replies(1): >>46267059 #
EmilStenstrom ◴[] No.46267059[source]
Depending on your perspective, you can take away any of the two points.

The first iteration of the project created a library from scratch, from the tests all the way to 100% test coverage. So even without the second iteration, it's still possible to create something new.

In an attempt to speed it up, I (with coding agent) rewrote it again based on html5ever's code structure. It's far from a clean port, because it's heavily optimized Rust code, that isn't possible to port to Python (Rust marcos). And it still depended on a lot of iteration and rerunning tests to get it anywhere.

I'm not pushing any agenda here, you're free to take what you want from it!

replies(2): >>46267361 #>>46267707 #
1. simonw ◴[] No.46267707[source]
I just had Codex CLI figure out where that first version ended and the new one began.

It looks to me like this is the last commit before the rewrite: https://github.com/EmilStenstrom/justhtml/tree/989b70818874d...

The commit after that is https://github.com/EmilStenstrom/justhtml/commit/7bab3d2 "radical: replace legacy TurboHTML tree/handler stack with new tokenizer + treebuilder scaffold"

It also adds this document called html5ever_port_plan.md: https://github.com/EmilStenstrom/justhtml/blob/7bab3d22c0da0...

Here's the Codex CLI transcript I used to figure this out: https://gistpreview.github.io/?53202706d137c82dce87d729263df...