←back to thread

257 points ColinWright | 1 comments | | HN request time: 0s | source
Show context
rokkamokka ◴[] No.45774428[source]
I'm not overly surprised, it's probably faster to search the text for http/https than parse the DOM
replies(2): >>45774455 #>>45779997 #
1. marginalia_nu ◴[] No.45779997[source]
The regex approach is certainly easier to implement, but honestly static DOM parsing is pretty cheap, but quite fiddly to get right. You're probably gonna be limited by network congestion (or ephemeral ports) before you run out of CPU time doing this type of crawling.