How To Identify Network Issues with Traceroutes?

Jean-François Lévesque
Jean-François Lévesque Last updated on Aug. 11, 2020

How To Identify Network Issues with Traceroutes?

We have written a series of articles about traceroutes, the most popular tool that network engineers use to troubleshoot network performance.

Traceroute Performance Metrics
Traceroute Performance Metrics

When looking at a traceroute, we usually have two important values for each hop or router: latency and packet loss. Let’s take a look at this traceroute from the Obkio Live Traceroute feature:

+---+-------------------+-------+-----+------+------+------+------+
| # | Hostname          | Loss% | Snt | Last |  Avg | Best | Wrst |
+---+-------------------+-------+-----+------+------+------+------+
| 1 | 192.168.1.1       |   0.0 |  10 |  1.0 |  1.6 |  0.5 |  3.9 |
| 2 | router1.ispA.com  |  10.0 |  10 |  5.0 |  5.6 |  4.5 |  7.9 |
| 3 | router2.ispB.com  |   0.0 |  10 | 10.0 | 10.6 |  9.5 | 15.9 |
| 4 | router3.ispC.com  |   0.0 |  10 | 12.0 | 12.6 | 11.5 | 22.9 |
| 5 | router4.ispC.com  |   0.0 |  10 | 13.0 | 13.6 | 12.5 | 23.9 |
| 6 | router5.ispC.com  |   0.0 |  10 | 14.0 | 14.6 | 13.5 | 21.9 |
| 7 | router6.ispC.com  |   0.0 |  10 | 15.0 | 15.6 | 14.5 | 29.9 |
| 8 | website.com       |   0.0 |  10 | 16.0 | 16.6 | 15.5 | 39.9 |
+---+-------------------+-------+-----+------+------+------+------+
Figure A

The latency is the round-trip latency calculated by the source. It refers to the time difference between the time when a packet was sent and when a response was received. In the table above, we have 10 latency values because 10 packets have been sent (column Snt). The last packet latency is Last, the average latency is Avg, the best and worst are the two last columns.

Packet loss simply refers to the percentage of sent packets which never received a response out of the total number of sent packets.

In this example, a loss of 10% at hop 2 is quite significant. However, the first thing to look at is the number of packets that have been sent (Snt column).

In this case, we lost 1 packet out of the 10 that were sent, resulting in a packet loss rate of 10%. So 10% packet loss is a lot but out of 10 packets, it’s not very significant. Out of 1,000 or 10,000 packets, it would be another story. Traceroute tools often have a configuration option to change the number of packets that are sent and the interval at which they are sent.

How to analyze packet loss?
How to analyze packet loss?

The rule of thumb when looking at a traceroute is very simple:

If the packet loss doesn't continue, don’t panic, it’s not an issue!

+---+-------------------+-------+-----+------+------+------+------+
| # | Hostname          | Loss% | Snt | Last |  Avg | Best | Wrst |
+---+-------------------+-------+-----+------+------+------+------+
| 1 | 192.168.1.1       |   0.0 |  10 |  1.0 |  1.6 |  0.5 |  3.9 |
| 2 | router1.ispA.com  |  50.0 |  10 |  5.0 |  5.6 |  4.5 |  7.9 |
| 3 | router2.ispB.com  |   0.0 |  10 | 10.0 | 10.6 |  9.5 | 15.9 |
| 4 | router3.ispC.com  |   0.0 |  10 | 12.0 | 12.6 | 11.5 | 22.9 |
| 5 | router4.ispC.com  |   0.0 |  10 | 13.0 | 13.6 | 12.5 | 23.9 |
| 6 | router5.ispC.com  |   0.0 |  10 | 14.0 | 14.6 | 13.5 | 21.9 |
| 7 | router6.ispC.com  |   0.0 |  10 | 15.0 | 15.6 | 14.5 | 29.9 |
| 8 | website.com       |   0.0 |  10 | 16.0 | 16.6 | 15.5 | 39.9 |
+---+-------------------+-------+-----+------+------+------+------+
Figure B

Let’s take a look at Figure B. We all know that 50% packet loss over a connection is terrible and makes it almost unusable. So are there any issues with this new traceroute example? Let’s apply the rule of thumb and figure it out.

Does the 50% packet loss continue in the traceroute? Does every hop report that same 50% that we see with hop #2? The answer is no, otherwise we would see packet loss with hops #3 through #8.

Should we panic and call our ISP to tell them we have packet loss on the path? No! Does it mean there is an issue with that router? No! It only tells us that hop #2 is responding to 50% of the packet or that 50% of the “ICMP TTL Exceeded” message returns to the source.

A deep dive on why we have packet loss with that hop is covered in Why Do Some Routers Drop Packets or Have High Latencies?.

In figure B, is the latency good or bad?
In figure B, is the latency good or bad?

Looking at the example above, is the latency good? Is it normal? With only this traceroute and no more information, we don’t know.

The latency between two hops can be affected by a number of things such as:

  • the distance between them
  • the medium connecting them (fiber optic, coax cable, copper lines, wireless, etc.)
  • the technology used (cable Docsis, DSL, GPON, dedicated fiber, etc.)
  • the configuration on the routers such as traffic shaping
  • the network condition such as congestion

So to be able to qualify the latency in a traceroute as good or bad, we need to know more information about the path. That information can come from our experience or knowledge of the path and routers, but the best one comes from historical traceroutes.

By comparing the latency over time, it’s much easier to know if the latency we are looking at is normal or not. Of course, a network performance monitoring solution such as Obkio has historical traceroute features that can help with that.

Analyzing a bad traceroute
Analyzing a bad traceroute

Here is another example similar to Figure B. We have the same path from the source to the destination but the packet loss and the latency values are different.

+---+-------------------+-------+-----+------+------+------+------+
| # | Hostname          | Loss% | Snt | Last |  Avg | Best | Wrst |
+---+-------------------+-------+-----+------+------+------+------+
| 1 | 192.168.1.1       |   0.0 |  10 |  1.0 |  1.6 |  0.5 |  3.9 |
| 2 | router1.ispA.com  |  50.0 |  10 | 50.0 | 55.6 | 33.5 | 77.9 |
| 3 | router2.ispB.com  |  50.0 |  10 | 52.0 | 54.6 |  9.5 | 56.9 |
| 4 | router3.ispC.com  |  50.0 |  10 | 54.0 | 53.6 | 32.5 | 66.9 |
| 5 | router4.ispC.com  |  50.0 |  10 | 55.0 | 55.6 | 44.5 | 72.9 |
| 6 | router5.ispC.com  |  50.0 |  10 | 53.0 | 52.6 | 21.5 | 58.9 |
| 7 | router6.ispC.com  |  50.0 |  10 | 52.0 | 56.6 | 29.5 | 99.9 |
| 8 | website.com       |  50.0 |  10 | 56.0 | 55.6 | 43.5 | 87.9 |
+---+-------------------+-------+-----+------+------+------+------+
Figure C

Let’s start with packet loss and the rule of thumb: does the packet loss continue after it started? Oh yes! In this case, we see 50% packet loss increase between hop #1 and hop #2 and it continues all the way to the last hop. So in this case, there are chances that there is indeed some packet loss between hop #1 and #2.

Be careful, internet traffic is asymmetrical so the issue can be on the reverse path! This topic is covered in Internet Traffic is Asymmetrical - How to Catch Reverse Path Issues?.

So if there is packet loss with routers at ISP A, ISP B and ISP C, maybe we should call all of them and tell them they have 50% packet loss on their routers… or maybe post that on social media... or maybe not… We should focus on where the packet loss starts and where it is between hop #1 and hop #2.

Let’s take a look at the other network performance metric we have in this traceroute, the latency. By comparing Figure B and C, it’s clear that there is an increase in the latency values, and it all starts between hop #1 and #2, just like the packet loss. In this case, with an increase of packet loss and an increase of latency, it looks like congestion on the network.

Since Hop #1 (192.168.1.1) is the business’ firewall and Hop #2 (router1.ispA.com) is the ISP A router, the congestion is probably on the business Internet connection. By looking at the bandwidth usage on the firewall, the IT administrator of the business can easily validate if there is congestion. A solution such as Obkio’s Network Device Monitoring solution is able to get that info.

In the case where there is no congestion, a trouble ticket can be opened with ISP A to troubleshoot the network issues and the traceroute must be shared with them to accelerate the troubleshooting.

Next Traceroute Articles
Next Traceroute Articles

We hope you enjoyed this article in the traceroute series.

Related Blog Categories:
Traceroutes