←back to thread

255 points ColinWright | 1 comments | | HN request time: 0.328s | source
Show context
latenightcoding ◴[] No.45774927[source]
when I used to crawl the web, battle tested Perl regexes were more reliable than anything else, commented urls would have been added to my queue.
replies(1): >>45775080 #
rightbyte ◴[] No.45775080[source]
DOM navigation for fetching some data is for tryhards. Using a regex to grab the correct paragraph or div or whatever is fine and is more robust versus things moving around on the page.
replies(2): >>45775143 #>>45775158 #
1. chaps ◴[] No.45775143[source]
Doing both is fine! Just, once you've figured out your regex and such, hardening/generalizing demands DOM iteration. It sucks but it is what is is.