NAT traversal is used when you are connected to the Internet behind one or more routers. Your Internet provider gives you one (and only one) public IP which is the way your Internet connection is identified and servers around the world will reach you using that address. But, behind your router, there are many devices, counting computers, phones and tablets all connected to the Internet using the same router. So, how does one single address serve so many devices? The *magic* is done by your router acting as a gateway for all Internet connections. Believe it or not, each time you are making a request to the Internet, requesting a web page, updating the system time using a global time server or downloading patches or updates for your software you are establishing a connection that has a port of origin and a port of destination, an address of origin and an address of destination, and a protocol wich most of the time is TCP but it might be UDP. For example, when you access google to browse for some topic or load your favorite news site, you are using port 80 (HTTP) or 443 (HTTPS). If you are connecting to google, your request is to server: www.google.com, port: 80, protocol: TCP. But thats just half of the story, the other part is who is making that request and by using which port. The *who* is the IP address your Internet provider has assigned to you and the port your device has fixed for that request. Usually, even as a developer, you don't need to specify a port of origin for TCP request, your operating system takes care to assign one that just works. So, connection is established using two layers of information: the identification of origin and destin (this is known as the IP protocol) and the language in which they will interact (in this case, the TCP protocol)
IP & Port <----- TCP -----> IP & Port
On top of this might come other information and layers of data. For instance, if you are encrypting that connection using SSL (HTTPS), this goes on top of TCP protocol. If we add your router to this picture, you'll figure out that it comes between you and google's server. Something like this:
YOU ROUTER SERVER
---------- ------------ ------------
IP & Port <-------> Gateway <--------> IP & Port
Please, note that you are not requesting google's page to your router, as if your router is google's server. Instead of that you are asking your router to act as if it is you and provide you with google's page. For doing that, your router has to do the same as your computer did: open a port and do the request to google's server.
On the other side, you will be now in a private network, created for all the devices behind your router. Devices in private networks use to have addresses in range 10.x.x.x or 192.168.x.x which are address ranges reserved for private use (RFC 1918). Routers usually use first assignable address in the range, for example, 192.168.0.1, and devices take the following ones, for example your computer might be 192.168.0.2. So, in this case, your request might be from 192.168.0.2:80 to www.google.com:80 using gateway at 192.168.0.1. The router will receive this request and build a table of request made by devices in his own network in order to return the correct request to the correct device. The first routing table record will be:
FROM HOST : PORT - TO SERVER : PORT - PROTOCOL - ROUTER PORT
192.168.0.2:2784 - www.google.com:80 - TCP - 2784
Did you notice the added information? The router has to use a port from it's own to establish the connection. Many routers use the same port used by the requester as long as it is available. But imagine that two devices in the same network want to establish connections from the same port and, in the worst case, to the same server. The router cannot figure out which answer belongs to which device. To solve this, the router will never use the same port for more than one device.
Looking at this picture from the other side, imagine a server wants to establish a connection with one of my devices which are behind the router by sending a request:
YOU ROUTER SERVER
---------- ------------ ------------
IP & Port ----?--- Gateway <--------- IP & Port
The only thing that outside servers see is my public IP address which is managed by our router, so all requests will fall in our router. Once it receives a package, it checks in it's table for a connection to that server using that port, it finds nothing and rejects or drops the request. You can forward ports in your router, so each request made to your router at some specific port has to be forwarded to a specific device and port in your private network, but doing this might be cumbersome and, even more, sometimes you don't even have access to that router configuration, for example, if you are at the coffe shop or at the airport.
You might wonder why a server might want to establish a connection to your computer. The reason is because the party trying to establish that connection with you might not be a server, but another device like yours, connected to the internet behind a router as well. For example, you have your workstantion at office connected to the Intenet using your office router and Internet provider and you are trying to access it from your home computer. Your office router has no fixed IP address and your office router has no port forwarding, but both computers can use a server for negotiating a direct connection. Of course, that server reachable to both parties can forward all the traffic, but it will be like going from California to New York going first to Mexico City and from there up again to New York. It's simply ineficient and slower. Establishing a direct peer to peer connection is the best solution and NAT traversal is called the study of how to implement this when both parties are behind a router.
The solution is as easy as creating the proper record in the router table, so it can properly route the incoming request to the computer waiting for it. The biggest problem here is knowing beforehand which port the router will assign to that connection.
NAT traversal has been focused on two main ideas: the called "cone" NAT and "symmetric" NAT. Cone NAT exists when a router implements that table but without the destination information or storing only part of the data. It has some logic: imagine you has been requested to create a router, you know all the request comming to it from a private network device will have the external IP:port and your router will use a port that most of the times is random to route that request, so any answer received in that port will certainly belong to the requesting device. If that's the case, the solution is quite straightforward: the server we have accesible to both devices will ask them to connect to it at a certain port so it learns the external port used by both routers for that connections. After that, it informs both devices to which ip:port and from which ip:port they have to connect to establish the desired connection.
But if the router stores all the table information, this will fail because the router knows that there was no request made to that server in the Internet. In that case we are in front of a "symmetric" NAT. In this case, the only alternative we have is trying to guess which port will be used by the other device router and try to create the correct record in routers table. There are two alternatives here: if the router try to use the same port used by the device in his private network, the server can generate random ports for both devices and ask them to connect to the IP:random port of the other. But there are routers that always change the port choosed by the device, some of them follow a pattern. If the pattern can be identified, port assignation can be predicted. Other routers don't follow a pattern and assign always a random port. In this later case there is nothing we can do. A common pattern used by routers is choosing ports in sequence, regardless of the port used by the requesting device. So, they end up with routing tables like this:
FROM HOST : PORT - TO SERVER : PORT - PROTOCOL - ROUTER PORT
192.168.0.2:2784 - www.google.com :80 - TCP - 6000
192.168.0.5:1567 - www.microsoft.com:80 - TCP - 6001
192.168.0.2:3021 - www.apple.com :80 - TCP - 6002
192.168.0.7:1120 - www.fsf.org :80 - TCP - 6003
One final distinction that might be relevant here is the way the router handles TCP connections. TCP is a *VERY* complex protocol. It implements an incredible logic to make sure it serves a reliable connection. It is well done and solves a huge number of situations in a transparent way, providing the developer with what looks like a connected channel, even when there is no direct connection (specially not a physical one) between the devices connected. On the other side, this huge robustness and flexibility comes with some drawbacks, specially on how well it uses all the available speed and resources. At the lower level, TCP protocol uses a SYN packet and an ACK packet to establish the connection. Some routers check for this packets, to know whats the connection status and reject unexpected packets like a SYN when the connection is already established. In this cases it should be handled as a symmetric NAT but that's not enought because, the complexity of TCP demands that both routers has to initiate TCP connection *AT THE SAME TIME* so both negotiate the connection with the expected packets.
UDP connections are more simple and doesn't have connection setup specific packets, but just one single packet that's the data packet. This might be the reason why most of the software using some kind of peer to peer direct connection implements it using UDP protocol: WebRTC, the new web protocol seems to use SCTP which is a protocol on top of UDP, same counts for Skype which uses UDP as well for peer to peer connection; LogMeIn Hamachi uses UDP as well, and most other peer to peer solutions use UDP. In fact, the success rate for UDP peer to peer connections is higher than for TCP. And in case you need a reliable connection, you can put a SCTP layer on top of your UDP connection and you have a reliable connection as well.
I'm sharing with you the testing code I wrote pretty much as a lab to make sure everything works as expected. It is python code using asynchronious and synchronious sockets (depending on the socket) and threads for client, server and connector (client requesting connection to the other party).
Hope you enjoy and learn. Happy coding!
Andy Galluzzi
Angall Corporation
Good references to learn more about this:
Disclamier: You will find a couple of documents that talk about "STUN" and "ICE" and other methods which talk more about a way to implement the server to which both devices has to connect in order to establish the peer to peer connection or to act as a gateway between both (as the "Mexico City" following my previous example). They are too much focused on a specific implementation, that most of the time will not work for you in case you are implementing your own server. They bring you too far away form the single important idea you have to remember here: all this business is about generating the right record in the routing table in your router.