1. Abstract
Most web filters work inline, meaning that all outgoing and incoming packets are passed through a filter driver. While this approach has its benefits, it has a big flaw: the filtering process affects the data transfer throughput. This project presents an experimental remedy to this issue by putting the filter engine in sniffer mode. This way, the filtering process and the data transfer act independently.
2. Requirements
- The article expects the reader to be familiar with C++ and TCP/IP concepts.
- The source code uses the following libraries:
- Winsock.
- Ethereal: A packet capture and network analyzer.
- The code was compiled and built with VC7 on Windows Server 2003.
3. Introduction
The main goal of this article is to explain the practical details of low level network programming. There are many commercial and open source firewalls and HTTP filters available on both Linux and Windows. But, internally, most of them follow the same approach to finding their targets. The References section of this article provides you handy books related to this topic.
Particularly, a web filter could act in two modes to inspect outgoing packets for blacklist keywords: Inline mode and Sniffer mode. We also explain both these modes of operation and compare them.
4. Background
This section explains the TCP/IP protocol stack, the HTTP protocol behavior, and the Boyer-Moore algorithm for fast pattern matching. If you think that you have enough experience and knowledge and want to get your hands dirty, please continue to section 7 (Implementation).
A mission critical system is totally different from an office application. Imagine that your team plans to develop a firewall, or web filter, or even a secure proxy server which processes tons of packets in a few seconds. Well, what are the main characteristics of such systems?
Being fail-safe, high-performing, full-featured, and Green, are a few. By Green, I mean the system should not eat out the CPU and the memory of the hosted platform. Meanwhile, in such an expensive project, there is no room for logical mistakes due to lack of technical knowledge! In a typical environment, your last result must be deployed in the model below:
4.1 TCP/IP protocol stack
As the name describes, TCP/IP refers to a number of protocols, each of which was developed for a purpose. In order to understand an HTTP session establishment process, we should know at-least the following protocols: Ethernet II, ARP, IP, TCP, UDP, HTTP, and DNS.
4.2 Ethernet II
The Ethernet II frame format was defined by the Ethernet specification created by Digital, Intel, and Xerox before the IEEE 802.3 Specification. The Ethernet II frame format is also known as the DIX frame format.
Ethernet II consist of the following fields (totally 26 bytes + a payload from 46 bytes to 1500 bytes):
- Preamble (8 bytes) consisting of 7 bytes of alternating 1s and 0s (each byte is the bit sequence 10101010) to synchronize a receiving station, and a 1-byte 10101011 sequence that indicates the start of a frame. The Preamble provides receiver synchronization and frame delimitation services.
- Destination Address (6 bytes) indicates the destination's address. The destination can be a unicast, a multicast, or Ethernet broadcast address. A unicast address is also known as an individual, physical, hardware, or MAC address. For the Ethernet broadcast address, all 48 bits are set to 1 to create the address 0xFF-FF-FF-FF-FF-FF.
- Source Address (6 bytes) indicates the sending node's unicast address.
- EtherType (2 bytes) indicates the upper layer protocol contained within the Ethernet frame. For an IP datagram, the field is set to 0x0800. For an ARP message, the EtherType field is set to 0x0806. For a complete list, see the References section.
- Payload consists of a protocol data unit (PDU) of an upper layer protocol. Ethernet II can send a maximum-sized payload of 1500 bytes. Because of the Ethernet's collision detection facility, Ethernet II frames must send a minimum payload size of 46 bytes.
- FCS (4 bytes) provides bit-level integrity verification on the bits in the Ethernet II frame using the CRC algorithm.
A typical capture shows you the fields below (FCS and Preamble are excluded):
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
| destination | source | protocol | |
| mac address | mac address | type | IP DATAGRAM |
| (6 bytes) | (6 bytes) | 0X0800 | |
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
4.3 ARP
ARP is a broadcast-based, request-reply protocol that provides a dynamic address resolution facility to map the next-hop IP addresses to their corresponding MAC addresses.
There are two facts regarding the datalink layer that show we need ARP:
- When an Ethernet frame is sent from one host on a LAN to another, it is the 48-bit Ethernet address that determines for which interface the frame is destined. The device driver software never looks at the destination IP address in the IP datagram.
- The next-hop IP address is not necessarily the same as the destination IP address of the IP datagram. The result of the route determination process for every outgoing IP datagram is a next-hop interface and a next-hop IP address. For direct deliveries to destinations on the same subnet, the next-hop IP address is the datagram's destination IP address. For indirect deliveries to remote destinations, the next-hop IP address is the IP address of a router on the same Subnet as the forwarding host. To get that device as a next hop, the packet needs its hardware address.
4.4 IP
IP, the heart of the TCP/IP protocol suite, provides connectionless, unreliable delivery of data. By unreliable, we mean that there is no guarantee that a datagram successfully gets to its destination. By connectionless, we mean that the IP doesn't maintain any information regarding successive datagrams. On the other hand, each datagram is handled independently. IP makes the best effort to deliver packets to the next hop or the final destination. End-to-end reliability is the responsibility of upper layer protocols such as TCP.
An IP header contains the following fields which we need to know for packet processing later:
- Version (4 bytes): Indicates the format of the Internet header.
- IHL (4 bytes): Is the length of the IP header in 32 bit words.
- Type of service or TOS (1 byte)
As the name indicates, it specifies how much the IP packet is important for us. Some intermediate devices evaluate this field in the case of high loads and prioritizes the datagram. In RFC 791 (Internet Protocol), this field is structured as follows:
0 1 2 3 4 5 6 7
+-----+-----+-----+-----+-----+-----+-----+-----+
| | | | | | |
| PRECEDENCE | D | T | R | 0 | 0 |
| | | | | | |
+-----+-----+-----+-----+-----+-----+-----+-----+
D >> Delay
T >> Throughput
R >> Reliability
Bit6 and Bit7 >> reserved
The total size of the header + the payload. The total length field, conceptually, allows the length of a datagram to be up to 65,535 bytes, although such a long packet is impractical for most hosts and network devices. Later, we study what a MTU is and how it may help us to put this in reality.
Anyway, remember that the IP header is only 20 bytes, and if there are any options, the length can go as high as 60 bytes. No more!
Assigned by the sender so that the receiver can decrement the fragmented IP packets due to the MTU value.
For the second time, I have mentioned "MTU", so let us see what the MTU is. Most of the data you generate while, for example, surfing the web, are bulk data. It means that the size of data is big. The underlying media access protocol splits the bulk to smaller parts so that it can send them seamlessly over the network infrastructure. In the case of the HTTP 802.3 Ethernet Protocol, the maximum size of a datagram is 1500 bytes. This number is called MTU, and stands for Maximum Transmission Unit. In case you want to transmit 15000 bytes of data to your mate, the protocol stack splits your message to 10 * 1500 bytes and transmits them one by one. If we put 20 bytes for the IP header, 1480 byes remain for the transport layer header and the payload. This is where ID comes into picture. The protocol stack splits the message to 10 smaller messages and assigns a unique ID in the IP header. When the receiver gets all the pieces, it can do further processing over the whole message.
Says if the datagram is a part of the fragmented data or not.
0 1 2
+---+---+---+
| | D | M |
| 0 | F | F |
+---+---+---+
An 8-byte chunk of data is called a fragment block. The number in the Fragment Offset field reports the size of the offset in fragment blocks. The Fragment Offset field is 13 bits long, so offsets can range from 0 to 8191 fragment blocks—corresponding to offsets from 0 to 65,528 bytes.
This field says how long a datagram could remain alive in a network system. It measures time in seconds.
Indicates the upper layer protocol type. For example:
1 >> ICMP
2 >> GIMP
4 >> IP in IP encapsulation
6 >> TCP
17 >> ADP
41 >> IPV6
47 >> Generic routing encapsulation (ARE)
50 >> IP security encapsulation security payload (ESP)
51 >> IP security authentication header (AH)
89 >> ASP
To measure the integrity of the header, the protocol stack handler performs a CRC on the header and compares it with the checksum value. It is a kind of sanity check.
Maintains a list of optional information for the datagram.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| IHL |Type of Service| Total Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification |Flags| Fragment Offset |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time to Live | Protocol | Header Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Destination Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- Total length (2 bytes):
- ID (2 bytes):
- Flags (3 bits):
- Fragment Offset (13 bits):
- TTL (1 byte):
- Protocol (1 byte):
- Header checksum (2 bytes):
- Source IP Address and Destination IP Address (each 4 bytes).
- Options (variable length):
4.5 TCP
TCP is a fully formed Transport Layer protocol that provides a reliable data-transfer service and a method to pass TCP-encapsulated data to the Application Layer protocol. TCP has the following characteristics:
- Connection oriented: Before any send or receive, both sides of data transfer must negotiate a TCP connection using a TCP three-way handshake process. TCP connections are terminated through a graceful TCP 4-way termination process.
- Full duplex: From one pipe, data can be sent and received simultaneously. TCP does this job perfectly, gaining from the sequence number and acknowledgment number fields in its header.
- Reliable: Data sent on a TCP connection is sequenced, and a positive acknowledgment is expected from the receiver. If no acknowledgment is received, the segment is retransmitted.
- Flow control: To avoid sending too much data at one time and congesting the routers of the IP inter-network, TCP implements sender-side flow control that gradually scales the amount of data sent at one time. To avoid having the sender send data that the receiver cannot buffer, TCP implements a receiver-side flow control that indicates the amount of space left in the receiver's buffer.
- Segmentation of application layer data: TCP segments data obtained from the Application Layer process so that it will fit within an IP datagram sent on the Network Interface Layer link. TCP peers exchange the maximum-sized segment that each can receive, and adjust the TCP maximum segment size using a Path Maximum Transmission Unit (PMTU) discovery.
- One to one delivery of data: TCP connections are a logical point-to-point circuit between two Application Layer protocols. TCP does not provide a one-to-many delivery service.
The figure below shows a TCP segment encapsulated in an IP datagram:
<--------------- IP datagram ------------------>
++++++++++++++++++++++++++++++++++++++++++++++++
| IP | TCP | TCP |
| Header | Header | Data |
++++++++++++++++++++++++++++++++++++++++++++++++
20 bytes 20 bytes
At-least At-least
TCP header:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Port | Destination Port |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Acknowledgment Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data | |U|A|P|R|S|F| |
| Offset| Reserved |R|C|S|S|Y|I| Window |
| | |G|K|H|T|N|N| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Checksum | Urgent Pointer |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
To compute a TCP checksum, we need to consider a TCP pseudo header. The TCP pseudo header is used to associate the TCP segment with the IP header. The TCP pseudo header is added to the beginning of the TCP segment only during the checksum calculation, and is not sent as part of the TCP segment. The use of the TCP pseudo header assures the receiver that a routing or fragmentation process did not improperly modify the key fields in the IP header.
+--------+--------+--------+--------+
| Source Address |
+--------+--------+--------+--------+
| Destination Address |
+--------+--------+--------+--------+
| zero |PROTOCOL| TCP Length |
+--------+--------+--------+--------+
- Source Port (2 bytes): The source port number.
- Destination Port (2 bytes): The destination port number.
- Sequence Number (4 bytes): Indicates the outgoing byte-stream-based sequence number of the segment's first octet.
- Acknowledgment Number (4 bytes): If the ACK control bit is set, this field indicates the sequence number of the next octet in the incoming byte stream that the receiver expects to receive.
- Data Offset (4 bits): Indicates where the TCP segment data begins. The Data Offset field is also the TCP header's size. Just as in the IP header's Header Length field, the Data Offset field is the number of 32-bit words (4-byte blocks) in the TCP header. For the smallest TCP header (no options), the Data Offset field is set to 5 (0x5), indicating that the segment data begins in the twentieth octet offset starting from the beginning of the TCP segment (the offset starts its count at 0). With a Data Offset field set to its maximum value of 15 (0xF), the largest TCP header, including TCP options, can be 60 bytes long.
- Reserved (6 bits): Reserved for future use. Must be zero.
- Control Bits (6 bits from left to right):
- URG: Urgent pointer field significant
- ACK: Acknowledgment field significant
- PSH: Push function
- RST: Reset the connection
- SYN: Synchronize sequence numbers
- FIN: No more data from sender
- Window (2 bytes): The number of data octets beginning with the one indicated in the acknowledgment field which the sender of this segment is willing to accept.
- Checksum (2 bytes): The checksum field is the 16 bit one's complement of the one's complement sum of all 16 bit words in the header and text. While computing the checksum, the checksum field itself is replaced with zeros.
- Urgent Pointer (2 bytes): This field communicates the current value of the urgent pointer as a positive offset from the sequence number in this segment. The Urgent Pointer points to the sequence number of the octet following the urgent data. This field is only interpreted in segments with the URG control bit set.
- Options (variable): Options may occupy space at the end of the TCP header, and are a multiple of 8 bits in length.
- Padding (variable): The TCP header padding is used to ensure that the TCP header ends and data begins on a 32 bit boundary. The padding is composed of zeros.
4.6 UDP
UDP is a simple, datagram-oriented, transport layer protocol: each output operation by a process produces exactly one UDP datagram, which causes one IP datagram to be sent. This is different from a stream-oriented protocol such as TCP, where the amount of data written by an application may have little relationship to what actually gets sent in a single IP datagram.
+--------+--------+--------+--------+ ----
| Source 2 | Destination 2 | |
| Port bytes| Port bytes| |
+--------+--------+--------+--------+ 8 bytes
| 2 bytes | 2 bytes | |
| Length | Checksum | |
+--------+--------+--------+--------+ ----
| |
| data octets |
+-----------------------------------+
4.7 DNS
The Domain Name System, or DNS, is a distributed database that is used by TCP/IP applications to map between hostnames and IP addresses, and to provide electronic mail routing information. We use the term distributed because no single site on the Internet knows all the information. Each site (university department, campus, company, or department within a company, for example) maintains its own database of information, and runs a server program that other systems across the Internet (clients) can query. The DNS provides the protocol that allows clients and servers to communicate with each other.
From an application's point of view, access to the DNS is through a resolver. On Windows and Unix hosts, the resolver is accessed primarily through two library functions, gethostbyname()
and gethostbyaddr()
, which are linked with the application when the application is built. The first takes a hostname and returns an IP address, and the second takes an IP address and looks up a hostname. The resolver contacts one or more name servers to do the mapping.
Note that DNS is an application layer protocol which is implemented over the UDP transport protocol.
4.8 HTTP
Simply put, HTTP is the protocol behind the World Wide Web. With every web transaction, HTTP is invoked.
HTTP headers
There are four types of header for HTTP:
- General headers indicate general information such as the date, or whether the connection should be maintained. They are used by both clients and servers.
- Request headers are used only for client requests. They convey the client's configuration and desired document format to the server.
- Response headers are used only in server responses. They describe the server's configuration and information about the requested URL.
- Entity headers describe the document format of the data being sent between a client and a server.
Although Entity headers are most commonly used by the server when returning a requested document, they are also used by clients when using the POST or PUT methods.
Client requests
- GET is used to retrieve a resource on the server.
- HEAD is used to retrieve some information about the document, but don't need the document itself.
- POST says that you are providing some information of your own (i.e., in forms). This may change the state of the server in some way, such as creating a record in a database.
- PUT is used to replace or create a new document on the server.
- DELETE is used to remove a document on the server.
- TRACE is used for protocol debugging purposes.
- OPTIONS is used when the client looks for other methods which can be used on the document.
- CONNECT is used when a client needs to talk to an HTTPS server through a proxy server.
Other HTTP methods that you may see (LINK, UNLINK, and PATCH) are less clearly defined.
For the reader of this article, only an understanding of the GET method is needed. This is the main method used for document retrieval. The response to a GET request can be generated by the server in many ways.
After the client uses the GET method in its request, the server responds with a status line, headers, and data requested by the client. If the server cannot process the request, due to an error or lack of authorization, the server usually sends an explanation in the entity-body of the response.
For example:
A client request
GET / HTTP/1.1\r\n
Accept: */*\r\n
Referer: http://www.google.com\r\n
Accept-Language: en-us\r\n
UA-CPU: x86\r\n
Accept-Encoding: gzip, deflate\r\n
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1;
.NET CLR 1.1.4322)\r\n
Host: www.google.com\r\n
Connection: Keep-Alive\r\n
Cache-Control: no-cache\r\n
Cookie: PREF=ID=76325601198c7def:LD=en:CR=2:TM=1184565890:
LM=1186242386:S=ofiC-IBOMCVySJCP\r\n
\r\n
The server response
HTTP/1.0 200 OK\r\n
Cache-Control: private\r\n
Content-Type: text/html; charset=UTF-8\r\n
Content-Encoding: gzip\r\n
Server: GWS/2.1\r\n
Content-Length: 2976\r\n
Date: Sat, 04 Aug 2007 15:46:45 GMT\r\n
X-Cache: MISS from netcache-1\r\n
Connection: keep-alive\r\n
\r\n
4.9 Performing pattern search by using the Boyer-Moore algorithm
The Boyer-Moore approach is very interesting. It doesn't need to actually check every character of the string to be searched, but rather skips over some of them. Let's say we want to find a word "ROTTEN" in a string like HTTP://WWW.WEBSERVER.COM/ROTTEN HTTP/1.1. Using the Boyer-Moore algorithm, we have the following steps:
HTTP://WWW.WEBSERVER.COM/ROTTEN HTTP/1.1
ROTTEN
- First, we fetch "/" because it stands in the same position as the last character of the pattern.
- There is no "/" in the pattern. So the search fails.
- Move the pattern right along 6 characters. So 'N' stands under 'W'. Match fails because there's no 'W' either.
- Move the pattern right along 6 characters. So 'N' stands under 'V'. Match fails because there's no 'V' either. Also, for 'M', a similar state will happen.
- Move the pattern right along 6 characters. So 'N' stands under 'E'. A match is found. There is an 'E' in the pattern. So, move the pattern right along a single place so both 'E's line up. And, compare right to left until the exact match is found.
This is exactly what we want. I had a comparison between Boyer-Moore and other case insensitive pattern matching methods. The results were much more efficient when I used Boyer-Moore. For more information about the Boyer-Moore pattern matching algorithm, please Google the topic.
5. An HTTP request life-time
In this section, we going to describe what happens until a page is downloaded from the Internet. Say, you open up your favorite Web browser, and type http://www.google.com in your address bar. Again, remember our network model.
Please run Ethereal, select your outgoing interface, and start capturing. You can check for all the information below, in Ethereal.
The "three-way handshake" is the procedure used to establish a connection. This procedure, normally, is initiated by one TCP and responded to by another TCP.
The figure below shows the exact process in our example:
You web server
1. CLOSED LISTEN
2. SYN-SENT --> <SEQ=ISN1><CTL=SYN><ACK=0> --> SYN-RECEIVED
3. ESTABLISHED <-- <SEQ=ISN2><ACK=ISN1+1><CTL=SYN,ACK> <-- SYN-RECEIVED
4. ESTABLISHED --> <SEQ=ISN1+1><ACK=ISN2+1><CTL=ACK> --> ESTABLISHED
5. ESTABLISHED --> <SEQ=ISN1+1><ACK=ISN2+1><CTL=ACK><DATA> --> ESTABLISHED
In line 2 of the figure, you begin by sending a SYN
segment indicating that you will use sequence numbers starting with the sequence number ISN1
. In line 3, the server sends a SYN
and acknowledges the SYN
it received from you. Note that the acknowledgment field indicates that the server is now expecting to hear the sequence ISN1
+1, acknowledging the SYN which occupied the sequence ISN1
. At line 4, you respond with an empty segment containing an ACK
for the server's SYN
.
In line 5, you send some data. Note that the sequence number of the segment in line 5 is the same as in line 4 because the ACK
does not occupy the sequence number space. As a result, <DATA>
contains the following statements:
GET / HTTP/1.1\r\n
Accept: */*\r\n
Referer: http://www.google.com\r\n
Accept-Language: en-us\r\n
UA-CPU: x86\r\n
Accept-Encoding: gzip, deflate\r\n
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1;
.NET CLR 1.1.4322)\r\n
Host: www.google.com\r\n
Connection: Keep-Alive\r\n
Cache-Control: no-cache\r\n
Cookie: PREF=ID=76325601198c7def:LD=en:CR=2:
TM=1184565890:LM=1186242386:S=ofiC-IBOMCVySJCP\r\n
\r\n
TCP resets are usually sent when something goes wrong ether for the client or the server. To see a real TCP reset, simply type http://www.codeproject.com in your address bar and press the Esc button after a few moments. A TCP reset packet is a 40 byte datagram with the TCP flags set to 0x0004 (RST). On the other hand, we have a graceful connection termination which is a 4-way sequence.
You web server
1. ESTABLISHED ESTABLISHED
2. (Close)
FIN-WAIT-1 --> <SEQ=ISN1><ACK=ISN2><CTL=FIN,ACK> --> CLOSE-WAIT
3. FIN-WAIT-2 <-- <SEQ=ISN2><ACK=ISN1+1><CTL=ACK> <-- CLOSE-WAIT
4. (Close)
TIME-WAIT <-- <SEQ=ISN2><ACK=ISN1+1><CTL=FIN,ACK> <-- LAST-ACK
5. TIME-WAIT --> <SEQ=ISN1+1><ACK=ISN2+1><CTL=ACK> --> CLOSED
6. (2 MSL)
CLOSED
Note that the TCP connection terminations do not have to use four segments. In some cases, segments 2 and 3 are combined. The result is a FIN
-ACK
/FIN
-ACK
/ACK
sequence.
- The browser tries to resolve the address to its corresponding IP address. To do this, it makes a single function call to
gethostbyname()
. This function generates a DNS query. A DNS query, as described in section 4.7, is a UDP datagram which is destined to the DNS server which is set in the TCP/IP properties. The TCP/IP handler takes a look at the destination IP address and realizes that the client IP address and the server IP address are in different subnets, so decides to forward the packet to the next hop, which is our gateway. To do this, the driver looks for the gateway's MAC address in the local ARP table. If, by any reason, the gateway's MAC address is not among the resolved address, TCP/IP broadcasts an ARP request in the subnet to get the corresponding MAC address. Upon receiving the gateway's MAC address, the client makes the final packet and puts it on the wire. Then, the packet is received by the gateway. The gateway looks at the destination IP address and realizes that this packet doesn't belong to itself, so decides to forward this to the next hop. From now, the packet traverses a global route until it successfully gets to the destination. The destination protocol handler then hands away the packet to the proper application layer service, which is the DNS in this case. Finally, the DNS server fetches the related IP address and makes a DNS response destined to the sender. - Destination IP address in hand, our client protocol stack can now establish a TCP session with the peer.
- The application layer transactions begins.
- The server receives the client's request and responds back with some data.
- Finishing data transfer. To do this, TCP provides you with two choices: TCP reset and TCP graceful termination.
6. Our solution for implementing a web filter
The solution is fairly simple. The following sequence is used to filter a matched web request:
- Capture packets.
- Distinguish TCP packets which are destined to port 80 or 8080.
- Look for a 'GET /' in the first four bytes of the data. If not found, continue with the next captured packet.
- Perform a Boyer-Moore pattern match over the data against the blacklist keywords. If not found, continue with the next captured packet.
- Send a TCP reset to the server and a block page to the client.
7. Implementation
This section explains the core functionality of our HTTP filter in details. All functionalities are available in the source code in the CProcessPacket
class.
7.1 Capturing packets with a raw socket
The fastest way to capture incoming traffic without using a NDIS low level driver is to use a Winsock raw socket. Although, you should note that a raw socket cannot capture in promiscuous mode. This means that the socket captures only traffic that is destined to its own address.
if((m_sniffSocket = socket(AF_INET, SOCK_RAW, IPPROTO_IP))==SOCKET_ERROR)
{
return 0;
}
PIP_ADAPTER_INFO pAdapterInfo = m_AdapterInfo;
u_long in = 0;
do {
if (strcmp (in_szSourceDevice, pAdapterInfo->AdapterName ) == 0)
{
break;
}
in++;
pAdapterInfo = pAdapterInfo->Next;
}
while(pAdapterInfo);
struct sockaddr_in src;
memset(&src, 0, sizeof(src));
src.sin_addr.S_un.S_addr =
inet_addr (pAdapterInfo->IpAddressList.IpAddress.String);
src.sin_family = AF_INET;
src.sin_port = 0;
if (bind(m_sniffSocket,(struct sockaddr *)&src,sizeof(src)) == SOCKET_ERROR)
{
return 0;
}
int j=1;
if (WSAIoctl(m_sniffSocket, SIO_RCVALL, &j,
sizeof(j), 0, 0, &in,0, 0) == SOCKET_ERROR)
{
return 0;
}
And, then, in a separate thread, we process the captured packets:
int res = 0;
char *pkt_data = (char *)malloc(65536);
char m_pLogString[256];
Packet p;
if (pkt_data == NULL)
{
return 0;
}
SetEvent(_this->m_hThrdReadyEvent);
do
{
res = recvfrom(_this->m_sniffSocket,pkt_data,65536,0,0,0);
if(res > 0)
{
ZeroMemory(&p, sizeof (Packet));
DecodeIP((u_int8_t*)pkt_data, res, &p);
if (p.banned == 1)
{
FilterHttpRequest(&p);
char ip_string_src[17];
char ip_string_dst[17];
memcpy (ip_string_src, inet_ntoa(p.iph->ip_src), 17);
memcpy (ip_string_dst, inet_ntoa(p.iph->ip_dst), 17);
sprintf (m_pLogString,
"Keyword \'%s\' is detected in a request"
" from %s to %s. Session Droped!",
p.matched, ip_string_src, ip_string_dst);
m_pFilterLog->AddLog(m_pLogString);
}
}
}
while (res > 0);
7.2 Distinguishing HTTP packets
In this section, we process the portions of the captured packet to find the Ethernet, IP, and TCP header fields. To perform this, we need a couple of handy structs to lay data over them.
struct IPHdr
{
u_int8_t ip_verhl;
u_int8_t ip_tos;
u_int16_t ip_len;
u_int16_t ip_id;
u_int16_t ip_off;
u_int8_t ip_ttl;
u_int8_t ip_proto;
u_int16_t ip_csum;
struct in_addr ip_src;
struct in_addr ip_dst;
};
struct TCPHdr
{
u_int16_t th_sport;
u_int16_t th_dport;
u_int32_t th_seq;
u_int32_t th_ack;
u_int8_t th_offx2;
u_int8_t th_flags;
#define TH_FIN 0x01
#define TH_SYN 0x02
#define TH_RST 0x04
#define TH_PSH 0x08
#define TH_ACK 0x10
#define TH_URG 0x20
u_int16_t th_win;
u_int16_t th_sum;
u_int16_t th_urp;
};
struct Packet
{
u_int8_t *pkt;
IPHdr *iph;
u_int32_t ip_options_len;
u_int8_t *ip_options_data;
TCPHdr *tcph;
u_int32_t tcp_options_len;
u_int8_t *tcp_options_data;
u_int8_t *data;
u_int16_t dsize;
u_int8_t http_state;
u_int8_t banned;
unsigned char matched[128];
#define CLIENT_REQUEST 0x01
#define SERVER_RESPONSE 0x02
#define NOT_HTTP 0x04
u_int8_t frag_flag;
u_int16_t frag_offset;
u_int8_t mf;
u_int8_t df;
u_int8_t rf;
};
Now, we can retrieve all the header information stored in the captured packet. To perform this step, you should know the related protocol specifications which we explained in section 4.
Packet p;
p->iph = (IPHdr *) (pkt_data);
if (p->iph->ip_proto == 6)
{
p->tcph = (TCPHdr *) (pkt_data + IP_HEADER_LEN);
if (p->tcph->th_flags & TH_ACK
&& p->tcph->th_flags & TH_PSH)
{
if(p->tcph->th_dport != htons(80) &&
p->tcph->th_dport != htons(8080) )
return ;
}
}
7.3 Looking for 'GET /' in the first 4 bytes
p->data = (unsigned byte*)(pkt_data + ETHERNET_HEADER_LEN +
IP_HEADER_LEN + TCP_HEADER_LEN)
if( p->data[0] == 'G' &&
p->data[1] == 'E' &&
p->data[2] == 'T' &&
p->data[3] == ' ' &&
p->data[4] == '/'
)
{
}
7.4 Performing a Boyer-Moore pattern matching algorithm on the payload
As I explained earlier, the Boyer-Moore is fast pattern matching algorithm which follows a non-liner approach to find a match.
if (CheckPattern(p->data, len))
{
}
7.5 Finishing the data transaction by sending a bunch of packets to both directions
Perhaps, this section is the most tricky part of the article. Here, I again remind that our HTTP filter works in Sniffer mode. As a result, we should do our best to prevent our reliable TCP engines on both sides liven up the session by resynchronizing.
SEQ = ISN1 ACK = ISN2
Direction = To Server
SEQ = ISN1 ACK = ISN2
FLAGS = RST
Direction = To Server.
SEQ = ISN2 ACK = ISN1 + Len(GET)
FLAGS = FIN | ACK
Direction = To Client.
Note that the implemented routine is available in CProcessPacket::SensorHttpRequest
.
- Let's say we grasp a target HTTP GET packet with the following TCP information:
- Again, due to the fact that we are operating in a monitor mode, we should get rid of server resynchronization attempts. For this purpose, I sent a TCP reset to the server to stop the TCP state machine for a while.
- Sending a block page to the client:
8. Special consideration
While it's a choice for the SOHO environment, our HTTP filter is a less common type of filtering solution for enterprise networks. This comes from the fact that the sniffing mode doesn't guarantee a synchronized reaction against TCP sessions because it doesn't stand against packet flows. Compare it with a football stadium before and after a game. Before the game, guards can inspect fans one by one and check their tickets. After the game, fans just rush outdoors, and it is fairly easy for an insurgent to get out of sight. At least, in my city, it's this way! ;)
It's a matter of one-by-one inspection! Along with its reliability, a one-by-one inspection has a big flaw: "a slow inspector could slow down the whole movement".
9. Future work
- Make the concept more reliable by performing asynchronous analysis in background threads
- Replace the core capturing functionalities with a network driver hook available in both Windows and Unix platforms
- Replacing the Boyer-Moore with a multi-pattern search so that dozens of patterns are evaluated in a single search
- Look for your ideas or requests
10. Revision history
- Initial release - 2007-08-06
- Code comments improved - 2007-08-07
- Total review; Excluding Winpcap and Libnet from the project; Performing both capture and send raw packets by using the raw socket functionality available in WinSock - 2007-08-26
11. My test environments
- Ethernet LAN
- WLAN 802.11/bg standard sub-network
Anyway, if you experience any malfunction, please report it. Thanks in advance!
12. References
- TCP/IP Illustrated, Volume 1 The Protocols W. Richard Stevens
- RFC 793, RFC 791, RFC 768, RFC 826, RFC 1034, RFC 1035. Refer to IETF
- Network programming and network security programming forum
- Ethereal
- Winpcap
- Windows server 2003 TCP/IP and services. Technical reference by Josef Davies and Thomas Lee.
- HTTP pocket reference by Clinton Wong, O'Reilly May 2000.