←back to thread

255 points ColinWright | 1 comments | | HN request time: 0.208s | source
Show context
rokkamokka ◴[] No.45774428[source]
I'm not overly surprised, it's probably faster to search the text for http/https than parse the DOM
replies(2): >>45774455 #>>45779997 #
embedding-shape ◴[] No.45774455[source]
Not probably, searching through plaintext (which they seem to be doing) VS iterating on the DOM have vastly different amount of work behind them in terms of resources used and performance that "probably" is way underselling the difference :)
replies(1): >>45775540 #
franktankbank ◴[] No.45775540[source]
Reminds me of the shortcut that works for the happy path but is utterly fucked by real data. This is an interesting trap, can it easily be avoided without walking the dom?
replies(1): >>45775601 #
embedding-shape ◴[] No.45775601[source]
Yes, parse out HTML comments which is also kind of trivial if you've ever done any sort of parsing, listen for "<!--", whenever you come across it, ignore everything until the next "-->". But then again, these people are using AI to build scrapers, so I wouldn't put too much pressure on them to produce high-quality software.
replies(2): >>45776771 #>>45777479 #
1. stevage ◴[] No.45776771[source]
Lots of other ways to include URLs in an HTML document that wouldn't be visible to a real user, though.