←back to thread

597 points classichasclass | 2 comments | | HN request time: 0s | source
Show context
bob1029 ◴[] No.45011628[source]
I think a lot of really smart people are letting themselves get taken for a ride by the web scraping thing. Unless the bot activity is legitimately hammering your site and causing issues (not saying this isn't happening in some cases), then this mostly amounts to an ideological game of capture the flag. The difference being that you'll never find their flag. The only thing you win by playing is lost time.

The best way to mitigate the load from diffuse, unidentifiable, grey area participants is to have a fast and well engineered web product. This is good news, because your actual human customers would really enjoy this too.

replies(7): >>45011652 #>>45011830 #>>45011850 #>>45012424 #>>45012462 #>>45015038 #>>45015451 #
phito ◴[] No.45011652[source]
My friend has a small public gitea instance, only use by him a a few friends. He's getting thousounds of requests an hour from bots. I'm sorry but even if it does not impact his service, at the very least it feels like harassment
replies(7): >>45011694 #>>45011816 #>>45011999 #>>45013533 #>>45013955 #>>45014807 #>>45025114 #
kiitos ◴[] No.45014807[source]
every single IPv4 address in existence receives constant malicious traffic, from uncountably many malicious actors, on all common service ports (80, 443, 22, etc.) and, for HTTP specifically, to an enormous and growing number of common endpoints (mostly WordPress related, last I checked)

if you put your server up on the public internet then this is just table stakes stuff that you always need to deal with, doesn't really matter whether the traffic is from botnets or crawlers or AI systems or anything else

you're always gonna deal with this stuff well before the requests ever get to your application, with WAFs or reverse proxies or (idk) fail2ban or whatever else

also 1000 req/hour is around 1 request every 4 seconds, which is statistically 0 rps for any endpoint that would ever be publicly accessible

replies(2): >>45015080 #>>45015487 #
1. sidewndr46 ◴[] No.45015487[source]
I was kind of amazed to learn that apparently if you connect Windows NT4/98/2000/ME to a public IPv4 address it gets infected by what is a period correct worm in no time at all. I don't mean that someone uses an RCE to turn it into part of a botnet (that is expected), apparently there are enough infected hosts from 20+ years ago still out there that the sasser worm is still spreading.
replies(1): >>45018039 #
2. hugo1789 ◴[] No.45018039[source]
I still remember how we installed Windows PCs at home if no media with the latest service pack was available. Install Windows, download service pack, copy it away, disconnect from internet, throw away everything and install Windows again...