←back to thread

137 points samray | 2 comments | | HN request time: 0s | source
Show context
ajd555 ◴[] No.45537856[source]
If a ping to a specific IP times out, I wouldn't say the IP is blocked. It could be that ICMP specifically is blocked, following some network rules on the firewall. This is pretty common in entreprise networks to not allow endpoint discovery. I could be missing something and happy to be corrected here, but I was surprised to read that.
replies(5): >>45537931 #>>45538067 #>>45538538 #>>45538647 #>>45540200 #
EvanAnderson ◴[] No.45538538[source]
I find it's important to remember, too, that a failed PING tells you nothing other than your echo request did not receive a response. If the remote host received your request, and if it responded, are both things a failed PING can't tell you, because both of those things could be true but you still end up with a failed PING.

I've seen technicians get tripped up in troubleshooting thinking that a failed PING tells them more than it does. When the possibility of asymmetric return paths is involved it's always important to remember how little a failed PING actually tells you.

replies(2): >>45539109 #>>45540467 #
jacquesm ◴[] No.45539109[source]
And that can be a lot more subtle than you might think. I've had a persistent very hard to debug false alarm triggered on pings sometimes not making it and most of the time they did. But very rarely that would happen three times in a row and that was the threshold for raising an alarm. We spent days on this. Finally, the root cause was tracked down to a BNC 'T' connector at the back of a media adapter that filtered out the header of some percentage of ICMP packets. It is one of the weirdest IT problems I've ever encountered and it makes me wonder how much of what we rely on is actually marginal.
replies(2): >>45539715 #>>45540106 #
gosub100 ◴[] No.45540106{3}[source]
I'm a SRE and encountered this recently. To prevent DDoS, there is a buffer setting on the kernel that will limit the number of pings (a few settings actually). So if you have a group of machines that all ping a single destination at once, it's very possible to have some that fail to get a reply.
replies(2): >>45540143 #>>45542106 #
jacquesm ◴[] No.45540143{4}[source]
Oh, that's nasty. How long did it take you to troubleshoot that?
replies(1): >>45540362 #
1. gosub100 ◴[] No.45540362{5}[source]
Relatively speaking, it wasn't that bad. It took a few weeks of getting trouble tickets with no root cause, and a bit of googling. But management wasn't okay with fixing the root cause, instead they just increased the timeout/retry window.
replies(1): >>45541175 #
2. jacquesm ◴[] No.45541175[source]
Wow. That's a classic. We were quite motivated because we were the ones that got the automated alerts. I still see them in my nightmares: "chopper is down". The machine was called chopper, I'll never forget, it's been close to 30 years. My buddy Jasper and me spent multiple nights trying to track it and when we finally found it we still couldn't believe that that was it. But a simple swap was proof.