Traceroute latency

From NANOG

Jump to: navigation, search

In telecommunications, latency is defined as the amount of time it takes for a message to be transmitted from source to destination.

Three primary factors account for observed latency:

  • Serialization delay
  • Queueing delay
  • Propagation delay


Contents

Serialization Delay

Serialization delay is the amount of time it takes for a networking device (such as a router) to encode a packet onto the wire for transmission. The exact delay is determined by the size of the packet, divided by the speed of the circuit onto which the packet is being transmitted. For example, a 64 byte (512 bit) packet being transmitted into a 56Kbps circuit has a serialization delay of 512 bits / 64000 bits/sec = .0091 seconds, or 9.1 milliseconds. On low speed links this can become an important factor, since a packet must be transmitted as one whole "unit" (i.e. the packet can not be sent to the next router until the first router has received the complete copy). If a traceroute packet encounters a serialization delay of 9.1ms, the measured latency can never be less than 9.1ms no matter what other circumstances are encountered.

As link speeds increase, serialization delay becomes insignificant to the overall latency calculation. For example, a 1500 byte (12000 bit) packet transmitted over a T1 has a serialization delay of 12000 bits / 1536000 bits/sec = 7.8ms, the same packet on a Fast Ethernet circuit has a delay of .12ms. On a Gigabit Ethernet circuit, the delay is only .012ms.


Queueing Delay

Many users talk about the "utilization" of a circuit, for example a 60% utilization meaning that a circuit is still 40% empty, but this is only true when discussing the average utilization over some unit of time (such as "per second"). At any given instant, an interface can be transmitting or not transmitting, 100% utilized or 0% utilized. Consider the case of a packet that has just arrived at a router, which now needs to be transmitted out a destination interface. If the interface is not current in use, the transmission begins immediately, and no queueing occurs (other than that necessary to make it through the router's hardware, typically measured in nanoseconds on modern routers). However, as discussed above a packet must be transmitted as one contiguous unit, so when the transmission begins the line is effectively locked into transmitting that specific packet for the length of the serialization delay. If another packet arrives to be transmitted out the same interface, but the interface is still in the process of transmitting the first packet, the new packet must be queued by the router until the interface is free.

Statistically, most packets will not encounter any queueing delay until a circuit is more than 50% utilized, and those that do will typically be delayed for only a very small amount of time. Queueing latency on a high-speed link does not contribute a significant amount to the overall path latency until a circuit reaches a state of "congestion", at around 95-97% overall utilization. At this point nearly every packet must be queued for a significant length of time while the router attempts to find a free time-slot on the interface to transmit the packet. If the queue becomes completely full, the router begins discarding packets. The method it uses to pick which packets to discard is called "queue management" or its "discard strategy", and ranges from simple methods such as "tail-drop" (dropping the last packet that didn't fit into the queue) to more complex methods which attempt to achieve fair discards, such as RED (random early detection).

Most carrier-class routers are designed to have a minimum buffering capacity of at least 250 milliseconds per port, but this buffering capacity is typically shared across multiple ports on the router. If not all ports are experiencing congestion, or if the port experiencing congestion is of a slower speed than the maximum the router is designed to handle, a router may have the buffering capacity for 4000 milliseconds or more. When traceroute encounters congestion on such a router, the measured latency for almost all packets will rise dramatically, making it easy to determine that the interface is congested. Other devices, such as Layer 3 switches which are intended for LAN use, have very small buffers by comparison. A typical buffering capacity for a 4-port 10GE card on a Layer 3 switch may be only 16MB shared across all 4 ports, giving only 3.3 milliseconds of buffering per port. When traceroute encounters a congested port on this type of device, it may see a small amount of packet loss, but not a large queueing-induced latency spike. This type of congestion can be very hard to distinguish from packet loss caused by rate-limiting on the router.

Propagation Delay

The speed at which data can be transmitted over long distances is governed by a fundamental limit of physics, the speed of light. The speed of light (physical constant c) is the maximum speed at which any form of electromagnetic radiation can propagate through a vacuum, and is defined as 299,792,458 meters per second (or 186,282,397 miles per second). For the purposes of in this article, we are approximating this by rounding up slightly to an even number of 300,000 km/s.

As light travels through a medium that is not a perfect vacuum, it propagates more slowly. The rate at which it propagates through a particular medium is defined as that medium's refractive index. For example, water has a refractive index of 1.33, which means that light travels through water at a speed of 1 / 1.33 or approximately 0.75c (that is, 75% of the speed of light in a vacuum, around 225,000 km/sec).

Telecommunications fiber relies on a principal called "total internal reflection" in order to transmit light over long distances without deterioration of the signal. Total internal reflection occurs when light attempts to pass from one medium to another, where the refractive index of the second medium is lower than the first medium, and the angle of incidence is very shallow (below a specific critical angle for the two mediums). Instead of ordinary refraction of the light passing from the first medium to the second medium, the light is entirely reflected back into the first medium. In order to achieve this effect, telecommunications fiber consists of two different materials, a "core" fiber with a refractive index of 1.48, and a "cladding" around the core with a refractive index of 1.46. By knowing the refractive index of the core fiber, we are able to calculate the speed at which light propagates through fiber, 1 / 1.48 = 0.67c, or approximately 200,000 km/s (125,000 miles/sec).

We can translate this into a distance per millisecond (the typical unit of time measurement for latency in telecommunications networks) by multiplying by 1000 ms / 1 sec, giving a figure of approximately 200km (or 125 miles) per millisecond per direction. However, most Internet measurement tools such as 'ping' and 'traceroute' typically calculate round trip time (rtt), i.e. it measures the time between its transmission and its receipt of a reply packet from the targeted endpoint. To adjust our rate for round-trip time calculations, we must multiply our time by 2, giving us a speed of light induced round-trip propagation delay of 1 millisecond per 100 km (or 62.5 miles).


Other Factors

Telecommunications networks almost never run fiber in perfectly straight paths. Many real-world considerations affect the available paths, such as the ability to obtain right-of-ways for the fiber conduit, the need to place regeneration stations along the path, and the desire to reach the largest number of customers possible for any particular fiber route. Technical considerations also result in non-linear fiber configurations, for example most networks implement redundancy through the use of ring topologies. If any portion of the ring is cut, connectivity is still available to users via the other side of the ring.


Real World Measurements

Real-world measurements seem to suggest that the following round-trip times are considered normal, and are probably not in and of themselves indicative of any congestion or performance problem:

Distance delay
Coast-to-coast USA (Virginia to California) 60-75ms
Trans-Atlantic (New York to London, England) 66-80ms
Trans-Pacific (California to Tokyo, Japan) 130-150ms
Trans-Pacific (California to Sydney, Australia) 170-190ms

External Links

Personal tools