←back to thread

265 points night-rider | 2 comments | | HN request time: 0.421s | source
Show context
commandersaki ◴[] No.38590611[source]
Copying the per-hop loss indicator from mtr is a bad decision in my opinion. It's always been a source of incorrect diagnosis of network issues. The only loss that matters is end to end.
replies(4): >>38590932 #>>38591072 #>>38591120 #>>38591527 #
lathiat ◴[] No.38591120[source]
That is not entirely true. Sure it’s not a 100% reliable signal as routing is asymmetric but it also often isn’t and also gives you an idea of which point to ask first at least.

If the packet loss starts at your wifi router, or your ISPs router. Or the next hop after you ISP. That all gives you a bit of an idea where the problem likely is. I solve problems like that all the time.

replies(1): >>38591827 #
commandersaki ◴[] No.38591827[source]
I was a network engineer for over a decade in hosting datacenter environments and would get false reports of packet loss to various destinations because people would use MTR and say they see loss on the path. If a packet takes the path A -> B -> C and pings to B have 50% loss but pings to C have 0% loss, then the path is perfectly fine.

The only way to reliably isolate packet loss to a hop on the path is to have a destination for testing where packets pass through that hop and is in its bailiwick which doesn't perform rate limiting or policing of ICMP traffic.

replies(1): >>38591945 #
FujiApple ◴[] No.38591945[source]
Something I intend to add to Trippy, but have not got around to it yet; is to codify the "If a packet takes the path A -> B -> C and pings to B have 50% loss but pings to C have 0% loss, then the path is perfectly fine" idea and use that to produce more meaningful headline status information to the user. How would you codify this?
replies(2): >>38592208 #>>38592597 #
1. commandersaki ◴[] No.38592208[source]
I would love for there to be a useful indicator to the user to say if loss or latency is an issue.

Being able to indicate cascading loss (e.g. path A->B->C->D) shows loss at B, C, and D, is worth bubbling up to the user to say there might be real issues. Also any indication of loss at D is also an issue. Trying to reconcile these scenarios with the UI matters, but I don't think there's an easy way. What I think is more important than UI that is sorely needed is documentation / users guide explaining how to read and understand these indicators. I know documentation is usually overlooked by users first trying out a program, but having it documented and explained can be used as a reference to point to a user that is misunderstanding the tool. I found that MTR didn't have this much needed documentation / reference that people would easily misunderstand the tool and it was a herculean effort to correct them.

I would also like to point out that a 0% loss indicator at the destination isn't reliable either if the packets are spaced out with enough slack. One of my goto when testing packet loss of a link I've brought up is to smash a destination host with a ping flood, e.g. ping -c 100 -f 1.1.1.1. By inundating the link it helps provide a clear indicator if there is loss somewhere on the path (usually the first mile or the last). Cloudflare speedtest now has a packet loss tester that floods 1000 packets, although I'm not sure if it does it over an unreliable transport or not.

replies(1): >>38592523 #
2. FujiApple ◴[] No.38592523[source]
I agree regarding documentation. There was a request [0] for something similar, though not specifically covering this important point.

Regarding sending a ping flood, Trippy allow you to reduce the minimum and maximum round time (and grace period) to send packets almost as fast as you like. For example, to send at 50ms intervals (with a 10ms grace period):

> trip example.com -i 50ms -T 50ms -g 10ms

[0] https://github.com/fujiapple852/trippy/issues/853