←back to thread

597 points classichasclass | 1 comments | | HN request time: 0s | source
Show context
bob1029 ◴[] No.45011628[source]
I think a lot of really smart people are letting themselves get taken for a ride by the web scraping thing. Unless the bot activity is legitimately hammering your site and causing issues (not saying this isn't happening in some cases), then this mostly amounts to an ideological game of capture the flag. The difference being that you'll never find their flag. The only thing you win by playing is lost time.

The best way to mitigate the load from diffuse, unidentifiable, grey area participants is to have a fast and well engineered web product. This is good news, because your actual human customers would really enjoy this too.

replies(7): >>45011652 #>>45011830 #>>45011850 #>>45012424 #>>45012462 #>>45015038 #>>45015451 #
phito ◴[] No.45011652[source]
My friend has a small public gitea instance, only use by him a a few friends. He's getting thousounds of requests an hour from bots. I'm sorry but even if it does not impact his service, at the very least it feels like harassment
replies(7): >>45011694 #>>45011816 #>>45011999 #>>45013533 #>>45013955 #>>45014807 #>>45025114 #
ralferoo ◴[] No.45013533[source]
What's worse is when you get bots blasting HTTP traffic at every open port, even well known services like SMTP. Seriously, it's a mail server. It identified itself as soon as the connection was opened, if they waited 100ms-300ms before spamming, they'd know that it wasn't HTTP because the other side wouldn't send anything at all if it was. There's literally no need to bombard a mail server on a well known port by continuing to send a load of junk that's just going to fill someone's log file.
replies(2): >>45014905 #>>45015499 #
sidewndr46 ◴[] No.45015499{3}[source]
It's even funnier when you realize it is a request for a known exploit in WordPress. Does someone really run that on port 22?
replies(1): >>45016435 #
Sohcahtoa82 ◴[] No.45016435{4}[source]
I HAVE heard of someone that runs SSH on port 443 and HTTPS on 22.

It blocks a lot of bots, but I feel like just running on a high port number (10,000+) would likely do better.

replies(1): >>45021476 #
1. mjmas ◴[] No.45021476{5}[source]
I have a service running on a high port number on just a straight IPv4 and it does get a bit of bot traffic, but they are generally easy to filter out when looking at logs (well behaved ones have a domain in their User-Agent and bingbot takes my robots.txt into account. I dont think I've seen the Google crawler. Other bots can generally be worked out as anything that didn't request my manifest.json a few seconds after loading the main page)