1. Weird 'connection reset' Problems with Google Sites
I recently bought a new Windows laptop for my Mum, and she immediately complained about not being able to reach either Google or GMail with any browser. Interestingly, all other sites (including competing search engines) were working fine and I could ping both google.com
and gmail.com
. However, trying to open them or any other Google-related site in either FF, Opera, Chrome or Internet Explorer resulted in 'connection reset' errors - and deleting cookies did not help.
After having 'binged' for a while, I found out that we were not the first people to experience this problem. Those findings misled me into thinking that the problem was specific to Mum's laptop and possibly caused by malware redirecting browsers to some weird address or such. This misguided conjecture was reinforced by the fact that other computers on our home LAN (particularly Mum's Linux desktop) were not affected.
I tested her laptop for malware and reset Windows sockets and IP registry settings (
netsh winsock reset
netsh int ip reset
), but this didn't help. Puzzled, I decided to dive deeper into this.
2. The Cause
While tweaking wi-fi router settings, I noticed that its MTU was set to default value (1500, the Ethernet MTU). MTU, short for Maximum Transmission Unit, determines the maximum size of a single IP datagram that can be transmitted over a certain link, but this normally does not affect TCP traffic - segments of size larger than MTU will be fragmented (split into parts).
However, this is not the case when Path MTU Discovery is used. With PMTUD, all fragments have Don't Fragment flag set and any router, if it cannot forward such a large datagram further and has to fragment it, will instead drop the datagram and send back appropriate notification with its own (smaller) MTU, in accordance to which packets should be split when resending.
My problems were caused by PMTUD failure, possibly due to paranoid Windows firewall filtering out ICMP traffic. Router's WAN uplink (a FreeBSD server of mine) has a PPPoE connection with smaller MTU than 1500. After examining traffic coming from router with tcpdump
, I discovered that it sent 1500 bytes sized datagrams with Don't Fragment flag, which were dropped. For some reason, Windows box never saw an ICMP Fragmentation Needed reply with appropriate MTU or ignored that.
3. The Solution
The solution for this particular problem was simple: I set the router's MTU to one matching PPPoE link and things started to work. However, I wanted to have a tool that would tell me what was the largest MTU which is safe to use on a particular path. In other words, I wanted to be able to discover path MTU manually.
I tried to Google for such a tool, but didn't find anything that would do just this. Since basic ping
can send packets of specified size with Don't Fragment flag, there's no practical need in a specialized tool and most advices go along these lines: try pinging with 1500 byte sized payload and decrement by 10 bytes until you get a reply. While this will work perfectly in most cases, this solution assumes a priori knowledge about how large MTU is likely to be and is tedious. (By the way, BSD systems - FreeBSD in particular - support 'size sweeping' pings, which make such approach less tedious).
4. The Automated Solution
Below is mtu-discovery.sh, a shell script which I wrote to automate discovery of the largest possible MTU in a general case. It starts with pretty low number and keeps doubling it until packets of that size are dropped. Then it bisects the range between size that doesn't work and last known good size to find the exact value.
This script should find the largest MTU faster (i.e., with less pings) than any size-sweeping solution, and it also doesn't assume any upper bound for MTU. I tried to make it portable, and the script runs on Linux, FreeBSD and probably Mac OS X, but I haven't tested it on the latter.
ABS_PATH_PING=/bin/ping
DONT_FRAGMENT_SWITCH="-M do"
NUM_PINGS=3
PING_OVERHEAD=28
SIZE_KNOWN_TO_WORK=100
SIZE_NOT_WORKING=100
test_ping()
{
local ADDR=$1
local SIZE=$2
$ABS_PATH_PING -c $NUM_PINGS -s $SIZE $DONT_FRAGMENT_SWITCH $ADDR >/dev/null 2>/dev/null
}
determine_range()
{
local ADDR=$1
SIZE_KNOWN_TO_WORK=$SIZE_NOT_WORKING
SIZE_NOT_WORKING=$(( 2 * $SIZE_NOT_WORKING ))
test_ping $ADDR $SIZE_NOT_WORKING
if [ $? -eq 0 ]; then
determine_range $ADDR
fi
}
bisect()
{
local ADDR=$1
local MIN=$2
local MAX=$3
local MID=$(( ($MIN + $MAX) / 2 ))
local RESULT=0
if [ $MIN -eq $MID ] || [ $MAX -eq $MID ]; then
echo $MIN
return
fi
test_ping $ADDR $MID
if [ $? -eq 0 ]; then
RESULT=`bisect $ADDR $MID $MAX`
else
RESULT=`bisect $ADDR $MIN $MID`
fi
echo $RESULT
}
if [ $# -lt 1 ]; then
echo "Usage: `basename $0` <addr>"
exit 1
fi
if [ -e /bin/ping ]; then
ABS_PATH_PING=/bin/ping
DONT_FRAGMENT_SWITCH="-M do"
echo "Using Linux ping."
elif [ -e /sbin/ping ]; then
ABS_PATH_PING=/sbin/ping
DONT_FRAGMENT_SWITCH=-D
echo "Using BSD ping."
else
echo "ping does not exist in either /bin or /sbin,
assuming Linux and using whatever you have in PATH."
ABS_PATH_PING=`which ping`
fi
test_ping $1 0
if [ $? -ne 0 ]; then
echo "Site is unreachable."
exit 2
fi
determine_range $1
echo "Bisecting $(($SIZE_KNOWN_TO_WORK +
$PING_OVERHEAD)) - $(($SIZE_NOT_WORKING + $PING_OVERHEAD)) bytes range."
MTU=`bisect $1 $SIZE_KNOWN_TO_WORK $SIZE_NOT_WORKING`
echo "Largest working MTU is $(($MTU + $PING_OVERHEAD)) bytes."
exit 0
An obvious (and potentially troublesome) assumption here is that path MTU isn't changed while this script is running. Since we cannot control the path, this is impossible to guarantee, and this limits the usefulness of this script.