Show HN: I made a Chrome extension that can automate any website

(browserflow.app)

707 points namukang | 2 comments | 17 Nov 21 15:16 UTC | HN request time: 0.431s | source

Show context

menthe ◴[18 Nov 21 04:26 UTC] No.29261972[source]▶

As a web scraper, I'll say that because he is hooking into the browser like a debugger / remotely controlled browser, just like Puppeteer would - he is instantly detected by the Cloudflare, PerimeterX, Datadome bot management solutions; and will get consistently banned on his page reload for literally any site caring about bots.

He'd be better off running some javascript on the page instead (a-la Tampermonkey, but can be done really nicely with some server-served TypeScript) to scrape the pages stealthily and perform actions.

replies(4): >>29262248 #>>29262765 #>>29262768 #>>29263957 #

colordrops ◴[18 Nov 21 06:56 UTC] No.29262768[source]▶

>>29261972 #

How exactly do these services detect Puppeteer?

replies(1): >>29263823 #

shaicoleman ◴[18 Nov 21 09:55 UTC] No.29263823[source]▶

>>29262768 #

They run JS tests such as the one linked in the peer comment: https://bot.sannysoft.com/

replies(1): >>29264160 #

1. menthe ◴[18 Nov 21 11:06 UTC] No.29264160[source]▶

>>29263823 #

Not only that - enterprise bot management protections will run behavioral identification (e.g. how your mouse moves —> AI -> bot yes/no), TCP stack fingerprinting (and other devices if available e.g. gyroscope), TLS ClientHello fingerprinting (e.g. see https://github.com/salesforce/ja3), etc. Lots of very unique info in the Scraping Enthusiasts discord where lots of pro scrapers hang out.

replies(1): >>29278397 #

2. zdware ◴[19 Nov 21 15:51 UTC] No.29278397[source]▶

>>29264160 (TP) #

I was on a project that used Google's Enterprise captcha v3 (passive mode, with all that "AI" jazz) and it was hot garbage. We tested against it using a simple selenium script and even though `navigator.webdriver` was true, it still gave 9/10 "likely a human".

↑