←back to thread

422 points simedw | 1 comments | | HN request time: 0s | source
Show context
treyd ◴[] No.44434108[source]
I wonder if you could use a less sophisticated model (maybe even something based on LSTMs) to walk over the DOM and extract just the chunks that should be emitted and collected into the browsable data structure, but doing it all locally. I feel like it'd be straightforward to generate training data for this, using an LLM-based toolchain like what the author wrote to be used directly.
replies(1): >>44435662 #
askonomm ◴[] No.44435662[source]
Unfortunately in the modern web simply walking the DOM doesn't cut it if the website's content loads in with JS. You could only walk the DOM once the JS has loaded, and all the requests it makes have finished, and at that point you're already using a whole browser renderer anyway.
replies(1): >>44437827 #
1. kccqzy ◴[] No.44437827[source]
Yeah but this project doesn't use JS anyway.