Path MTU Discovery
When a host needs to transmit data out an interface, it references the interface's Maximum Transmission Unit (MTU) to determine how much data it can put into each packet. Ethernet interfaces, for example, have a default MTU of 1500 bytes, not including the Ethernet header or trailer. This means a host needing to send a TCP data stream would typically use the first 20 of these 1500 bytes for the IP header, the next 20 for the TCP header, and as much of the remaining 1460 bytes as necessary for the data payload. Encapsulating data in maximum-size packets like this allows for the least possible consumption of bandwidth by protocol overhead.
Unfortunately, not all links which compose the Internet have the same MTU. The MTU offered by a link may vary depending on the physical media type or configured encapsulation (such as GRE tunneling or IPsec encryption). When a router decides to forward an IPv4 packet out an interface, but determines that the packet size exceeds the interface's MTU, the router must fragment the packet to transmit it as two (or more) individual pieces, each within the link MTU. Fragmentation is expensive both in router resources and in bandwidth utilization; new headers must be generated and attached to each fragment. (In fact, the IPv6 specification removes transit packet fragmentation from router operation entirely, but this discussion will be left for another time.)
To utilize a path in the most efficient manner possible, hosts must find the path MTU; this is the smallest MTU of any link in the path to the distant end. For example, for two hosts communicating across three routed links with independent MTUs of 1500, 800, and 1200 bytes, the smallest (800 bytes) must be assumed by each end host to avoid fragmentation.
Of course, it's impossible to know the MTU of each link through which a packet might travel. RFC 1191 defines path MTU discovery, a simple process through which a host can detect a path MTU smaller than its interface MTU. Two components are key to this process: the Don't Fragment (DF) bit of the IP header, and a subcode of the ICMP Destination Unreachable message, Fragmentation Needed.
Setting the DF bit in an IP packet prevents a router from performing fragmentation when it encounters an MTU less than the packet size. Instead, the packet is discarded and an ICMP Fragmentation Needed message is sent to the originating host. Essentially, the router is indicating that it needs to fragment the packet but the DF flag won't allow for it. Conveniently, RFC 1191 expands the Fragmentation Needed message to include the MTU of the link necessitating fragmentation.
Now that the actual path MTU has been learned, the host can cache this value and packetize future data for the destination to the appropriate size. Note that path MTU discovery is an ongoing process; the host continues to set the DF flag so that it can detect further decreases in MTU should dynamic routing influence a new path to the destination. RFC 1191 also allows for periodic testing for an increased path MTU, by occasionally attempting to pass a packet larger than the learned MTU. If the packet succeeds, the path MTU will be raised to this higher value.
You can test path MTU discovery across a live network with a tool like tracepath (part of the Linux IPutils package) or mturoute (Windows only). Here's a sample of tracepath output from the lab pictured above, with the MTU of F0/1 reduced to 1400 bytes using the ip mtu command:
Host$ tracepath -n 192.168.1.2 1: 192.168.0.2 0.097ms pmtu 1500 1: 192.168.0.1 0.535ms 1: 192.168.0.1 0.355ms 2: 192.168.0.1 0.430ms pmtu 1400 2: 192.168.1.2 0.763ms reached Resume: pmtu 1400 hops 2 back 254