Introduction
Some days ago, I was given an assignment which includes finding an extraordinary way to identify user machine without using cookies or IP address. When the user visits the web application for the first time, the user will have to receive "No, you have not connected before and are a new user" message, else if it is other than the first visit, the user will have to receive "Yes, you have connected before". So I studied the HTTP/1.0 protocol to have some clue and came up with the idea of identifying machine based on cache information as explained below.
Background
The communication between web browser (HTTP application client) and web server (HTTP application server) takes place in the form of requests and response. A request is sent from web browser to web server and a response is returned from web server to web browser. Each request or response is composed of two parts, headers and data. Headers give extra information and settings about the request or response. The data part contains the actual data intended.
Most of the browsers cache the internet data they receives from the web server. So, whenever the user again visits the content she/he has visited earlier, the web browser asks the web server whether the content has changed since the last time it received from it, and sends a time stamp of the last visit along with the "Request of the Content" to web server, this is called Conditional Get. The web server would return the contents only if it is modified since the time stamp it received from the browser. In this way, browsing is accelerated, as there is no need for downloading cached data, if it's not modified. In the process of delivering content to the browser, the web server may tell the web browser the last modified date of contents, if it does so, the web browsers (famous browser such as MSIE, and Mozilla Firefox) return the same time stamp to web server to query whether the content is modified since that time.
The last modified date of content is sent from the web server to the web browser in Last-Modified header. In the second visit, the web browser requests the content by specifying the last modified time stamp in If-Modified-Since header.
Using the Code
This is a small program consisting of static
functions. The code is explained below briefly. The following system namespaces are used:
using System;
using System.Collections.Generic;
using System.Net.Sockets;
using System.Net;
using System.IO;
System.Collections.Generic
namespace is used to have Dictionary
objects. System.Net
and System.Net.Socket
are used to implement TCP/IP networking features. System.IO
is used to have TextReader
and TextWriter
features.
The main procedure is described below:
public static Dictionary<string, string> headers = null;
static void Main(string[] args)
{
TcpListener listener = null;
try
{
IPAddress address = Dns.GetHostEntry( Dns.GetHostName() ).AddressList[0];
Console.WriteLine("Info: server start at IP: " + address + " Port: 80");
listener = new TcpListener(address, 80);
listener.Start();
while (true)
{
try
{
Socket conn = listener.AcceptSocket();
Console.WriteLine("*************************************");
Console.WriteLine("Info: Connection established,
Connected to IP: " + ((IPEndPoint)conn.RemoteEndPoint).Address +
" Port: " + ((IPEndPoint)conn.RemoteEndPoint).Port);
Console.WriteLine("*************************************");
NetworkStream stream = new NetworkStream(conn);
TextReader reader = new StreamReader(stream);
TextWriter writer = new StreamWriter(stream);
if (ParseRequest(reader))
{
if ("GET" == method)
{
headers = new Dictionary<string, string>();
while (ReadNParseHeader(reader)) ;
if ("/" == resource)
{
SendHTMLIdentifyUser(writer);
}
else
{
Console.WriteLine("Warning: Invalid resource: \""
+ resource + "\" requested");
}
}
else
{
Console.WriteLine
("Warning: Only GET method supported, Closing connection");
}
}
Console.WriteLine("Info: Closing Connection Successfully");
Console.WriteLine("-------------------------------------");
writer.Close();
reader.Close();
stream.Close();
conn.Close();
}
catch (Exception exception)
{
Console.WriteLine("Warning: " + exception.Message);
}
}
}
catch (Exception exception)
{
Console.WriteLine("ERROR: " + exception.Message);
}
finally
{
if (null != listener)
{
Console.WriteLine("Info: Stopping listener");
listener.Stop();
listener = null;
}
}
Console.WriteLine("Program Ended, Press ENTER to exit");
Console.ReadLine();
}
'header
' object is of Dictionary
type having string
for both its keys and values; it is used later in the program.
First of all, the IP address is obtained and along with port it is printed on console, to let the user know on which socket the server is listening (to avoid confusion, in case machine has more than one IP assigned to it). Then listener at that socket is created and started. Then the remote socket is obtained to serve it. Then TextReader
and TextWriter
objects are created to communicate over the network using streams of text; as majority of HTTP communication is usually plain text.
The first element is the basic request from HTTP client (web browser). This includes the method, content identifier and location, and HTTP version. The ParseRequest
procedure parses this basic request and puts the method in method, resource in resource and protocol in httpProtocol static string
objects.
There are three basic types of requests to HTTP server: GET
HEAD
and PUT
. Here, the simple program only supports GET
request.
While serving the remote socket, first the request headers are read and parsed using ReadNParseHeader
, which places each header's title and value in header dictionary object. The "/
" resource specifies the default content, which is only supported here. Then HTML is sent to client using the SendHTMLIdentifyUser
procedure.
ParseRequest
procedure is described below:
public static string method,
resourceLoc,
resource,
queryString,
httpProtocol;
private static bool ParseRequest(TextReader reader)
{
string request = ReadUntilCRLF(reader);
Console.WriteLine("Info: Request received: \"" + request + "\"");
string[] tokens = request.Split(new string[] { " " },
StringSplitOptions.RemoveEmptyEntries);
if (3 != tokens.Length)
{
Console.WriteLine("Warning: Request must split in 3 tokens");
return false;
}
method = tokens[0].ToUpper();
queryString = "";
int indexEnd = tokens[1].IndexOf('?');
if (indexEnd < 0)
{
indexEnd = tokens[1].Length;
}
else
{
queryString = tokens[1].Substring(indexEnd, tokens[1].Length - indexEnd);
}
int indexLastSeperator = tokens[1].LastIndexOf('/');
int resLen = indexEnd - indexLastSeperator;
resource = tokens[1].Substring(indexLastSeperator, resLen);
if (0 == tokens[1].ToLower().IndexOf("http://"))
{
int indexSeperator = tokens[1].IndexOf('/', 7);
resourceLoc = tokens[1].Substring(indexLastSeperator,
indexEnd - indexLastSeperator - resLen);
}
else
{
resourceLoc = tokens[1].Substring(0, indexEnd - resLen);
}
httpProtocol = tokens[2].ToUpper();
Console.WriteLine("Info: Method: " + method);
Console.WriteLine("Info: Resource Location: " + resourceLoc);
Console.WriteLine("Info: Resource: " + resource);
Console.WriteLine("Info: Query String: " + queryString);
Console.WriteLine("Info: Protocol: " + httpProtocol);
return true;
}
As I said earlier, the first entity delivered from the web browser to the web server is a basic request which consists of request method, resource identifier and HTTP version. Other things in query string may include resource location and query string. The resource location can be relative or absolute. ParseRequest
function simply separates this information and stores it in respective static string
objects, i.e. method
, resourceLoc
, resource
, queryString
and httpProtocol
.
ReadNParseHeader
procedure is described below:
private static bool ReadNParseHeader(TextReader reader)
{
string header = ReadUntilCRLF(reader);
if (header.Length > 0)
{
Console.WriteLine("Info: Header received: \"" + header + "\"");
string[] tokens = header.Split(new string[] { ": " },
StringSplitOptions.RemoveEmptyEntries);
if (tokens.Length == 2)
{
headers.Add(tokens[0].ToUpper(), tokens[1]);
}
else
{
Console.WriteLine("Warning: Cannot Parse header");
}
return true;
}
else
{
return false;
}
}
HTTP request and response headers follow a strict format. Each header consists of header title followed by colon ':' followed by space followed by value of that header. Each header is terminated by carriage return and line feed i.e.'\r\n'. End of header portion is specified by extra carriage return and line feed after last header. ReadNParseHeader
parses each header and stores the header title and header value in dictionary. This procedure returns true
if more headers follow, else it returns false
.
ParseRequest
and ReadNParseHeader
procedures use ReadUntilCRLF
procedure described below:
private static string ReadUntilCRLF(TextReader reader)
{
string strLine = "";
char prevChar = '\0',
currChar = (char)reader.Read();
while (!('\r' == prevChar && '\n' == currChar))
{
strLine += currChar;
prevChar = currChar;
currChar = (char)reader.Read();
}
strLine = strLine.Substring(0, strLine.Length - 1);
return strLine;
}
This function reads the text stream character by character until it finds carriage return and line feed. It returns the string
till before the sentinel values ('\r\n').
The actual function which causes the real trick for identifying the machine is SendHTMLIdentifyUser
; it is described below:
private static void SendHTMLIdentifyUser(TextWriter writer)
{
string html = "<HTML><BODY>Hello! How are you? ";
bool userNew = true;
string keyIfModifiedSince = "IF-MODIFIED-SINCE";
foreach (string key in headers.Keys)
{
if (key == keyIfModifiedSince)
{
userNew = false;
break;
}
}
int currentId = -1;
string lastModified = DateTime.Now.ToString("R");
writer.Write("HTTP/1.0 200 OK\r\n");
writer.Write("Content-Type: text/HTML\r\n");
writer.Write("Last-Modified: " + lastModified + "\r\n");
string strIden = "";
if (userNew)
{
currentId = machineId++;
strIden = "No, you have not connected before and are a new user";
}
else
{
string lastDate = headers[keyIfModifiedSince];
int indexSep = lastDate.IndexOf(';');
if (indexSep < 0)
{
indexSep = lastDate.Length;
}
lastDate = lastDate.Substring(0, indexSep);
try
{
currentId = machineIdentification[lastDate];
machineIdentification.Remove(lastDate);
strIden = "Yes, you have connected before";
}
catch (Exception)
{
currentId = machineId++;
strIden = "No, you have not connected before and are a new user";
}
}
html += strIden + "</BODY></HTML>";
machineIdentification.Add(lastModified, currentId);
writer.Write("Content-Length: " + html.Length + "\r\n");
writer.Write("\r\n");
writer.Write(html);
Console.WriteLine("Info: Machine Id: " + currentId);
}
The trick applied in the above function is that first of all, the IF-MODIFIED-SINCE header is search in header fields, its presence means that user has already visited before; on the other hand its absence means that the user is a first time visitor. If the user has already visited the site, the time stamp in IF-MODIFIED-SINCE request header will help us to get user profile (in our case user ID); else if user is a first time visitor, we have to create a new profile for the user (in our case, new user ID). Obtain the current time stamp, and map the user profile with this time stamp, send the time stamp to user in Last-Modified header; along with the user requested resource. Thus user can be identified with the help of IF-MODIFIED-SINCE request header and Last-Modified response header.
Points of Interest
It is worth noting that MSIE also sends the last content length received with the time stamp to the web server.
History
- 2nd September, 2007: Initial post