Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / web

How to build a simple website analysis services (like google analytics)

4.52/5 (7 votes)
16 Jan 2014CPOL5 min read 27.5K   476  
What's behind the website analysis services (like google analytics) & how to build one.
Introduction   

Hello everybody!   

This is my second article on CodeProject and it is the natural sequel of the first dedicated to development of a visitors counter with autorefresh feature. This time I want to explain how an analysis service, like Google Analytics, works to provide realtime information on what happens in a website. Furthermore, I would explain how to build one. (extremely semplified)  

Like I said before, my English is very very ugly and poor, but i hope you will help me improving text quality. As usual, i use .NET and VB in order to speed up development time but all methods described in this article can be implemented in any development language. 

If you use google analytics you can see some spectacular features: the number of active visitors, the current viewed pages, the behavior of users and much more. (All of them, in realtime!) 

But someone can answer: how this works? How it is made? We try to answer this questions. 

As first, we can observe that many (google isn't the only) analysis services needs to add Javascript code in all your pages. Essentially, in order to work properly, the following HTML template will be implemented in all your pages:       

HTML
 <html>
    <head>
      <script language="javascript" src="SERVICE_URL/file.js"></script>
    </head>
     ... page contents ...
    <body> 
</body>
</html> 
 
So, the first question could be: what does file.js do?   

When you add the <script> tag in your header, your browser will download third-part code to execute. What this code do is reported in the next section.

You could observe that to download a file, your browser must create an Http Request to the server: on the server-side, it is possible to know many info about the request source: For example, a server can access to your Ip Address, to your Remote HostName and to your Browser's name; and this, alone, can provide sufficient information to build a little tracking system.  

A simple schematization of an analysis server is shown in following figures.      

Image 1 

As you can see, for each page loaded in your browser a copy of file.js is downloaded from analysis server. This mean that you allow the server to do some things(non-hazardous, javascript run in a sandbox)  for you.  

So, the second question could be: "what kind of information the server reads from my browser? and how can he read it? Therefore how my browser send information to server". 

The following image answer this questions.   

Image 2

First, in order to trace the user uniquely a UUID is generated: This UUID is stored in a cookie and it will be used as User Identifier (because IP Address can change between two connection from same user, if user use NAT or other protocol); Subsequently, the current location (the url), the browser name, the OS name and version and other info can be read by code in file.js and will sent to server.   

Each 'track' packet has the form [UUID],{User_DATA} and server only needs to manage a set of [UUID,User_DATA] in order to supply analysis features: the set is an hashtable with UUID as key and an ArrayList as value 

So, our next goal is the building of a prototype that supply following features (that also are available in Google analytics):   

  1. Number of active visitors 
  2. Number of connected visitors  
  3. Current page viewed by connected visitors  
  4. Page History for each connected visitor   

The following figure shows an example of 'Console' accessible on the serverside that show current website status.  (I use my homemade analysis service daily, in some of my e-commerce websites. Obviously, the image is  changed to hide ip users )   

Image 3 

Background              

For the purpose of this article you need to know what we mean for UUID, Hashtable (named Dictionary in .net) and Arraylist. We only need a copy of Microsoft Web Developer Express (that is free downloadable from Microsoft website).   

Since we use Microsoft IIS we exploit the Application object that live within Application Pool until it is not recycled: This simple implementation does not save data in a DBMS, then for each IISreset all data will be lost; But, if you want, you can exploit App_End and App_Start event to Save and Restore data between DBMS and Application memory. 

Using the code      

Now we analyze how to structure the project in order to track users activity on a website: In my previous article i talked about a simple hashtable used to manage visitors, in order to build a realtime counter. Now i want to extend previous code in order to handle the other informations we want.  

Client side 

As first, we need to implement the logic inside file.js or rather the creation of uuid, the reading of information and the sending to the server. Particularly we need to:  

  1. Read cookies and set Cookies (function getCookie and setCookie) 
  2. Create an Ajax async call (function getXmlReq) 
  3. Generate an UUID (Body of file) 
  4. Reading location of browser  (function __as__ping) 
My simple implementation of file.js is following: 

JavaScript
// Address of track server. This address is communicated by server when browser download this file. 
var NETSELL_STAT = 'http://localhost:82';


function getCookie(c_name, remote) {


    // get normal cookies
    if (document.cookie.length > 0) {
        c_start = document.cookie.indexOf(c_name + "=");
        if (c_start != -1) {
            c_start = c_start + c_name.length + 1;
            c_end = document.cookie.indexOf(";", c_start);
            if (c_end == -1) c_end = document.cookie.length;
            return unescape(document.cookie.substring(c_start, c_end));
        }
    }
    return "";
}

function setCookie(c_name, value, expiredays, remote) {


    var cookiebody;
    var exdate = new Date();
    exdate.setSeconds(exdate.getSeconds() + expiredays);
    //exdate.setDate(exdate.getDate() + expiredays);

    cookiebody = c_name + "=" + escape(value) +
((expiredays == null) ? "" : ";expires=" + exdate.toUTCString());

    if (remote != null) {
        // remote cookie// send cookies to LogonServ
    }
    else // normal cookie
        document.cookie = cookiebody;
}

function getXMLReq() {
    var xmlhttp;
    if (window.XMLHttpRequest) {// code for IE7+, Firefox, Chrome, Opera, Safari
        xmlhttp = new XMLHttpRequest();
    }
    else {// code for IE6, IE5
        xmlhttp = new ActiveXObject("Microsoft.XMLHTTP");
    }
    return xmlhttp;
}

// Check for UUID of this user (if not exist create one)
var uuid = getCookie("site_uuid");
if (uuid == "") {
    var d = new Date();
    var rnd = Math.floor((Math.random() * 100000) + 1);
    uuid = d.getDay() + '_' + d.getMonth() + '_' + d.getYear() + '_' + rnd + '_' + d.getSeconds() + '_' + d.getMilliseconds() + '_' + d.getMinutes() + '_' + d.getHours();
    setCookie("site_uuid", uuid);
}

// send uuid to server (the ping)
function __as_ping() {    
    var ping = getXMLReq();    
    ping.open("GET", NETSELL_STAT + "/srv/serverside.aspx?TYPE=PING&UUID=" + uuid + '&L=' + location.href.toString().replace('&', '::'), true);
    ping.send();
}

__as_ping();

When all data was read and sent the client doesn't have to do anything. 

Server side 

On the other hand, the server must manage all information about user. Previously, I have talked about Hashtable (the Dictionary) and following you can view a simple implementation it. 

First, we need to initialize the memory space where we want to maintain data.    

In global.asax file we write:

VB
Sub Application_Start(ByVal sender As Object, ByVal e As EventArgs)
    ' When application start

    Application.Add("LastReset", Date.Now)
    ' We make sure that 'memory' is available
    SyncLock Application
        Dim ActiveUser As Dictionary(Of String, decorablePosition)
        ActiveUser = CType(Application("ActiveUser"), Dictionary(Of String, decorablePosition))
        If IsNothing(ActiveUser) Then
            ActiveUser = New Dictionary(Of String, decorablePosition)
            Application.Add("ActiveUser", ActiveUser)
        End If
        Application.Add("ActiveUser", ActiveUser)
    End SyncLock


End Sub

Subsequently, we only need to store 'track' packet from client side. We can create an aspx page (the page where file.js send data) named serverside.aspx with following content:

<%@ Page Language="VB" AutoEventWireup="false" CodeFile="serverside.aspx.vb" Inherits="srv_serverside" %>
<%@ Import Namespace="System.collections.generic" %>
<%@Import Namespace="DIBIASI.CALCE_Min.ABSTRACT.TDA.UTILS" %>
<%
    
    ' on PING receive we check if UUID is known
    ' then save last action date and time (and location, and ip)
    If Request("TYPE") = "PING" Then        
        Dim UUID As String = Request("UUID")
        SyncLock CType(Application("ActiveUser"), Dictionary(Of String, decorablePosition))
            If Not CType(Application("ActiveUser"), Dictionary(Of String, decorablePosition)).ContainsKey(UUID) Then
                CType(Application("ActiveUser"), Dictionary(Of String, decorablePosition)).Add(UUID, New decorablePosition)
                CType(Application("ActiveUser"), Dictionary(Of String, decorablePosition))(UUID).setValueOf("LOCATION_STORY", New ArrayList)
            End If                
            CType(Application("ActiveUser"), Dictionary(Of String, decorablePosition))(UUID).setValueOf("DATE", Date.Now)
            CType(Application("ActiveUser"), Dictionary(Of String, decorablePosition))(UUID).setValueOf("LOCATION", Request("L"))
            CType(CType(Application("ActiveUser"), Dictionary(Of String, decorablePosition))(UUID).getValueOf("LOCATION_STORY"), ArrayList).Add(Date.Now & "|" & Request("L"))
             CType(Application("ActiveUser"), Dictionary(Of String, decorablePosition))(UUID).setValueOf("IPADDR", Request.UserHostAddress)
         End SyncLock             
        
    End If
 %>
<span style="font-size: 14px; white-space: normal;">
</span>

Finally, we only need to use stored data, proceeding (for example) as follow.  

First, we need to compute total user and it is the number of entry in the dictionary because for each user we have an uuid. As second, we want to compute connected user and we can iterate all entries of dictionary and count only entry with last-action-date less than 240 secs.

The active user field can be determined in the same way (last-action less than 60 secs). Finally, we can access to current page viewed by user reading the "LOCATION" field 

Following you can read an example of page that use stored data.

 

ASPX
<%@ Page Language="vb" AutoEventWireup="false" CodeBehind="stats.aspx.vb" Inherits="Analysis.stats" %>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head runat="server">
    <title></title>
</head>
<body>
    
    
<div style="font-family:Tahoma;background-color:#f6f6f6;float:left;border:1px solid #e6e6e6;width:20%;height:180px;text-align:center;vertical-align:middle;">
<%
    Dim ConnectedUser As Integer = 0
    Dim actu As Integer = 0
    Dim visitorFromLastReset As Integer = 0
    Dim visitorToday As Integer = 0
    Dim ActiveKart As Integer = 0
    dim euroinKart as double=0

    For Each it As KeyValuePair(Of String, decorablePosition) In CType(Application("ActiveUser"), Dictionary(Of String, decorablePosition))
        '# count visit from last reset
        visitorFromLastReset += 1
        '# count visit today
        If Format(CDate(it.Value.getValueOf("DATE")), "yyyyMMdd") = Format(Date.Now, "yyyyMMdd") Then
            visitorToday += 1
        End If
        '# count connected users
        If Math.Abs(DateDiff(DateInterval.Second, CDate(it.Value.getValueOf("DATE")), Date.Now)) <= 240 Then
            ConnectedUser += 1
            
            '# count active users
            If Math.Abs(DateDiff(DateInterval.Second, CDate(it.Value.getValueOf("DATE")), Date.Now)) <= 60 Then
                actu += 1
            End If
        End If
    Next it
 %>        
    <table width="100%">
    <tr>
        <td><%=Format(Application("LastReset"),"dd/MM/yy HHHH.mm") %></td>
        <td>Today Visitors</td>
    </tr>
    <tr>
        <td><span style="font-size:1.3em;"><%=visitorFromLastReset%></span></td>
        <td><span style="font-size:1.3em;color:Blue;"><%=visitorToday%></span></td>
    </tr>
    </table>

    <table width="100%">
    <tr>
        <td>Connected Now</td>
        <td></td>
    </tr>
    <tr>
    <td><span style="font-size:1.3em;"><%=ConnectedUser%></span></td>
    <td></td>
    </tr>
    </table>
    Active Now
    <br />
    <span style="font-size:2em;color:blue;"><%=actu%></span>
    
</div>   
 

 <!-- show active page for each user -->
<br />
<div style="font-family:tahoma;font-size:0.8em;display:block;float:left;border:1px solid #e6e6e6;width:99%;height:200px;overflow:auto;text-align:center;vertical-align:middle;">
<table border="0" cellspacing="0" cellpadding="0">
<%      
    Dim foreColor As String = "#000"
    Dim LOCATION As String = ""
    Dim RASCL As String = ""
    For Each it As KeyValuePair(Of String, decorablePosition) In CType(Application("ActiveUser"), Dictionary(Of String, decorablePosition))
        If Math.Abs(DateDiff(DateInterval.Second, CDate(it.Value.getValueOf("DATE")), Date.Now)) <= 240 Then
            foreColor="#000"
            If Math.Abs(DateDiff(DateInterval.Second, CDate(it.Value.getValueOf("DATE")), Date.Now)) <= 60 Then
                foreColor = "#33CC33"
            End If
            
            LOCATION = it.Value.getValueOf("LOCATION").ToString.Split("/")(it.Value.getValueOf("LOCATION").ToString.Split("/").Length - 1)
            RASCL = " <strong>" & mid(it.Value.getValueOf("RASCL"),1,20) & "</strong>"
%>
        <tr style="color:<%=foreColor%>">            
            <td style="width:35%;padding:1px;" align="left"><span><a href="followUserAction.aspx?IPADDR=<%=it.Value.getValueOf("IPADDR") %>" target="_blank"><%=it.Value.getValueOf("IPADDR") %> <%=RASCL %></a></span></td>
            <td align="left"><span><%=LOCATION%></span></td>
        </tr>
<%           
        End If
    Next it
 %>    
 </table>
</div> 
</body>
</html>

In the zip attached to this article you can found a complete prototype runnable in Web Developer Express. (Remember to start debug on port 82 or change the path in file.js) 

History   

14/01/2014: draft release 

16/01/2014: first release

16/01/2014: English revisione made by Bruno Interlandi

17/01/2014: reload zip file




License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)