(twitter.com)

550 points polskibus | 1 comments | 08 Feb 19 16:04 UTC | HN request time: 0s | source

Show context

jordan801 ◴[08 Feb 19 16:53 UTC] No.19116099[source]▶

>>19115460 (OP) #

Anyone who has written a few scrappers knows how brutally ineffective this is. Yelp tried to pull the same thing and it took me about 3 minutes to rectify my "for fun" scraper. It's also really not that difficult to write a smart scraper that you say, "Look for these things in this post. However you find them, replicate it for the others". Which is ultimately what I made my Yelp scraper do.

If there's a pattern, I will find it, and I will exploit it. <3

replies(8): >>19116147 #>>19116340 #>>19116656 #>>19116724 #>>19117143 #>>19117402 #>>19117423 #>>19121248 #

eeeeeeeeeeeee ◴[08 Feb 19 16:56 UTC] No.19116147[source]▶

>>19116099 #

Yep, seems like a total waste of time. The people scraping will spend the necessary time to get around this (and then distribute that knowledge to the masses) so it seems like a pointless arms race. Facebook employees could better use their time on developing actual features that bring value.

replies(5): >>19116174 #>>19116250 #>>19116413 #>>19116676 #>>19117180 #

taf2 ◴[08 Feb 19 17:39 UTC] No.19116676[source]▶

>>19116147 #

They could render the whole thing in canvas for example

replies(1): >>19116931 #

nacs ◴[08 Feb 19 18:02 UTC] No.19116931[source]▶

>>19116676 #

So you'd block all canvas elements if ads are always a <canvas>.

If they turn all their posts into <canvas> then it'd kill any accessibility features and the ability to copy-paste text and such so I doubt they'd go that far.

Even then, a scraper could run OCR on the canvas image to get the text out of it.

replies(2): >>19117134 #>>19119999 #

1. sqd ◴[08 Feb 19 18:23 UTC] No.19117134{3}[source]▶

>>19116931 #

I don't think these html pieces is very accessibility-tool friendly..

↑

Facebook adds 5 divs, 9 spans and 30 CSS classes to every post in the timeline