←back to thread

255 points ColinWright | 1 comments | | HN request time: 0.2s | source
Show context
OhMeadhbh ◴[] No.45775001[source]
I blame modern CS programs that don't teach kids about parsing. The last time I looked at some scraping code, the dev was using regexes to "parse" html to find various references.

Maybe that's a way to defend against bots that ignore robots.txt, include a reference to a Honeypot HTML file with garbage text, but include the link to it in a comment.

replies(5): >>45775128 #>>45775617 #>>45776644 #>>45776976 #>>45780383 #
1. vaylian ◴[] No.45776644[source]
The people who do this type of scraping to feed their AI are probably also using AI to write their scraper.