←back to thread

597 points classichasclass | 6 comments | | HN request time: 0.333s | source | bottom
1. PeterStuer ◴[] No.45010862[source]
FAFO from both sides. Not defending this bot at all. That said, the shenanigans some rogue or clueless webmasters are up to blocking legitimate and non intrusive or load causing M2M trafic is driving some projects into the arms of 'scrape services' that use far less considerate nor ethical means to get to the data you pay them for.

IP blocking is useless if your sources are hundreds of thousands of people worldwide just playing a "free" game on their phone that once in a while on wifi fetches some webpages in the background for the game publisher's scraping as a service side revenue deal.

replies(3): >>45011013 #>>45011442 #>>45011662 #
2. ahtihn ◴[] No.45011013[source]
What? Are you trying to say it's legitimate to want to scrape websites that are actively blocking you because you think you are "not intrusive"? And that this justifies paying for bad actors to do it for you?

I can't believe the entitlement.

replies(1): >>45011095 #
3. PeterStuer ◴[] No.45011095[source]
No. I'm talking about literally legitimate, information that has to be public by law and/or regulation (typically gov stuff), in formats specifically meant for m2m consuption, and still blocked by clueless or malicious outsourced lowest bidder site managers.

And no, I do not use those paid services, even though it would make it much easier.

4. geocar ◴[] No.45011442[source]
Exactly. If someone can harm your website on accident, they can absolutely harm it on purpose.

If you feel like you need to do anything at all, I would suggest treating it like any other denial-of-service vulnerability: Fix your server or your application. I can handle 100k clients on a single box, which equates to north of 8 billion daily impressions, and so I am happy to ignore bots and identify them offline in a way that doesn't reveal my methodologies any further than I absolutely have to.

5. BLKNSLVR ◴[] No.45011662[source]
> IP blocking is useless if your sources are hundreds of thousands of people worldwide just playing a "free" game on their phone that once in a while on wifi fetches some webpages in the background for the game publisher's scraping as a service side revenue deal.

That's traffic I want to block, and that's behaviour that I want to punish / discourage. If a set of users get caught up in that, even when they've just been given recycled IP addresses, then there's more chance to bring the shitty 'scraping as a service' behaviour to light, thus to hopefully disinfect it.

(opinion coming from someone definitely NOT hosting public information that must be accessible by the common populace - that's an issue requiring more nuance, but luckily has public funding behind it to develop nuanced solutions - and can just block China and Russia if it's serving a common populace outside of China and Russia).

replies(1): >>45012365 #
6. PeterStuer ◴[] No.45012365[source]
Trust me, there's nothing 'nuanced' about the contractor that won the website management contract for the next 6-12 months by being the cheapest bidder for it.