when I used to crawl the web, battle tested Perl regexes were more reliable than anything else, commented urls would have been added to my queue.
DOM navigation for fetching some data is for tryhards. Using a regex to grab the correct paragraph or div or whatever is fine and is more robust versus things moving around on the page.