←back to thread

707 points namukang | 1 comments | | HN request time: 0.21s | source
Show context
menthe ◴[] No.29261972[source]
As a web scraper, I'll say that because he is hooking into the browser like a debugger / remotely controlled browser, just like Puppeteer would - he is instantly detected by the Cloudflare, PerimeterX, Datadome bot management solutions; and will get consistently banned on his page reload for literally any site caring about bots.

He'd be better off running some javascript on the page instead (a-la Tampermonkey, but can be done really nicely with some server-served TypeScript) to scrape the pages stealthily and perform actions.

replies(4): >>29262248 #>>29262765 #>>29262768 #>>29263957 #
bdcravens ◴[] No.29262765[source]
Run it against https://bot.sannysoft.com/ to see how it stacks up

Most anti-Puppeteer tech analyzes the state of various browser Javascript objects, and if you run Puppeteer in headful mode with plugins like https://www.npmjs.com/package/puppeteer-extra-plugin-stealth you'll bypass most detection.

replies(2): >>29263653 #>>29264183 #
1. shaicoleman ◴[] No.29263653[source]
Disabling headless and adding the following command line option: --disable-blink-features=AutomationControlled

is enough to pass all the tests above with cuprite (Ruby), without needing any extra plugins