/top/
/new/
/best/
/ask/
/show/
/job/
^
slacker news
login
about
←back to thread
AI scrapers request commented scripts
(cryptography.dog)
255 points
ColinWright
| 1 comments |
31 Oct 25 15:44 UTC
|
HN request time: 0.328s
|
source
Show context
latenightcoding
◴[
31 Oct 25 18:09 UTC
]
No.
45774927
[source]
▶
>>45773347 (OP)
#
when I used to crawl the web, battle tested Perl regexes were more reliable than anything else, commented urls would have been added to my queue.
replies(1):
>>45775080
#
rightbyte
◴[
31 Oct 25 18:23 UTC
]
No.
45775080
[source]
▶
>>45774927
#
DOM navigation for fetching some data is for tryhards. Using a regex to grab the correct paragraph or div or whatever is fine and is more robust versus things moving around on the page.
replies(2):
>>45775143
#
>>45775158
#
1.
chaps
◴[
31 Oct 25 18:29 UTC
]
No.
45775143
[source]
▶
>>45775080
#
Doing both is fine! Just, once you've figured out your regex and such, hardening/generalizing demands DOM iteration. It sucks but it is what is is.
ID:
GO
↑