Introduction
I wanted to keep trace of visitors and know the usual stuff of web analytics: visitors, source, nationality, behaviour and so on.
And client side analytics are not so reliable:
- Ad Blockers interfere with them
- Using a third party service require to annoy the user with those huge cookie consent banners
- They drastically increase the loading time of the web application
- They don't register API calls and any other not-html calls like web api
So i developed by myself a very simple server side analytics system for Dot Net Core, which is running on my website.
Live demo: https://matteofabbri.org/stat
GitHub repo: https://github.com/matteofabbri/ServerSideAnalytics
NuGet: https://www.nuget.org/packages/ServerSideAnalytics
The middleware
The idea is to implement a middleware that will be invoked on every request, no matter if a route was specified or not.
This middleware will be put into the task pipeline and setted up using only fluid methods.
The middleware will write incoming request into a generic store after the processing of the request is completed.
The middleware will be inserted into the task pipeline by using UserServerSideAnalytics extension method in app startup.
This method requires an IAnalyticStore interface that is gonna be the place where our recieved request will be stored.
public void Configure(IApplicationBuilder app)
{
app.UseServerSideAnalytics(new MongoAnalyticStore("mongodb://192.168.0.11/matteo"));
}
Inside the extension will create a FluidAnalyticBuilder and bind it to the task pipeline via the method Use.
public static FluidAnalyticBuilder UseServerSideAnalytics(this IApplicationBuilder app,IAnalyticStore repository)
{
var builder = new FluidAnalyticBuilder(repository);
app.Use(builder.Run);
return builder;
}
The FluidAnalyticBuilder is a fluid class that will handle the configuration of the analytics that we want to collect (like filtering unwanted url, ip address and so on) and practically implement the core of the system via the method Run.
In this method ServerSideAnalytics will use two method of the store:
- ResolveCountryCodeAsync : retrieve (if existing) the country code of remote IP address.
If not existing CountryCode.World is expected
- StoreWebRequestAsync: store the recieved request into the database
internal async Task Run(HttpContext context, Func<Task> next)
{
await next.Invoke();
if (_exclude?.Any(x => x(context)) ?? false)
{
return;
}
var req = new WebRequest
{
Timestamp = DateTime.Now,
Identity = context.UserIdentity(),
RemoteIpAddress = context.Connection.RemoteIpAddress,
Method = context.Request.Method,
UserAgent = context.Request.Headers["User-Agent"],
Path = context.Request.Path.Value,
IsWebSocket = context.WebSockets.IsWebSocketRequest,
CountryCode = await _store.ResolveCountryCodeAsync(context.Connection.RemoteIpAddress)
};
await _store.StoreWebRequestAsync(req);
}
(May I should add other fields to collected requests ? Let me know 😊 )
Via the List<Func<HttpContext, bool>> _exclude it also provide easy methods to filter out requests that we don't care about.
public void Configure(IApplicationBuilder app, IHostingEnvironment env)
{
app.UseDeveloperExceptionPage();
app.UseBrowserLink();
app.UseDatabaseErrorPage();
app.UseAuthentication();
app.UseServerSideAnalytics(new MongoAnalyticStore("mongodb://localhost/matteo"))
.ExcludePath("/js", "/lib", "/css")
.ExcludeExtension(".jpg", ".ico", "robots.txt", "sitemap.xml")
.Exclude(x => x.UserIdentity() == "matteo")
.ExcludeIp(IPAddress.Parse("192.168.0.1"))
.ExcludeLoopBack();
app.UseStaticFiles();
}
And that is all the middleware 😀
The Store
Has you seen above the middleware writes collected data into a generic store expressed by the interface IAnalyticStore, the component that will handle all the dirty work of this job.
I wrote three store:
https://www.nuget.org/packages/ServerSideAnalytics.Mongo for Mongo DB
https://www.nuget.org/packages/ServerSideAnalytics.SqlServer for Microsoft SQL Server
https://www.nuget.org/packages/ServerSideAnalytics.Sqlite for SQLite
In the attached code you will find a sample site using SQLite, so no external process is needed to run the example.
The store has to implement an interface with two methods invoked by Server Side Analytics and some method to query stored requests.
This because database types isolation is so cool but also means that you can not cast a Expression<Func<MyType,bool>> to Expression<Func<WebRequest,bool>> no matter how similar MyType and WebRequest would beem.
We will see the use of those method in the last part of the article regarding the exposition of our data inside the web application
public interface IAnalyticStore
{
Task StoreWebRequestAsync(WebRequest request);
Task<long> CountUniqueIndentitiesAsync(DateTime day);
Task<long> CountUniqueIndentitiesAsync(DateTime from, DateTime to);
Task<long> CountAsync(DateTime from, DateTime to);
Task<IEnumerable<IPAddress>> IpAddressesAsync(DateTime day);
Task<IEnumerable<IPAddress>> IpAddressesAsync(DateTime from, DateTime to);
Task<IEnumerable<WebRequest>> InTimeRange(DateTime from, DateTime to);
Task<IEnumerable<WebRequest>> RequestByIdentityAsync(string identity);
Task StoreGeoIpRangeAsync(IPAddress from, IPAddress to, CountryCode countryCode);
Task<CountryCode> ResolveCountryCodeAsync(IPAddress address);
Task PurgeRequestAsync();
Task PurgeGeoIpAsync();
}
Identities
Has you maybe noticed every WebRequest has got a field name Identity.
This because the most important data is to know Who made What.
But how is evaluated ?
- If is from a registred user, we gonna use is username
- If not we gonna use the default AspNetCore cookie
- If not available we use the connection id of the current context
- Then we gonna try to save the result in our own cookie so we don't have to do it again
In code:
public static string UserIdentity(this HttpContext context)
{
var user = context.User?.Identity?.Name;
const string identityString = "identity";
string identity;
if (!context.Request.Cookies.ContainsKey(identityString))
{
if (string.IsNullOrWhiteSpace(user))
{
identity = context.Request.Cookies.ContainsKey("ai_user")
? context.Request.Cookies["ai_user"]
: context.Connection.Id;
}
else
{
identity = user;
}
context.Response.Cookies.Append("identity", identity);
}
else
{
identity = context.Request.Cookies[identityString];
}
return identity;
}
IP Geocoding
One of the most interesting data of every analytic system is to know where your user come from.
So the IAnalyticStore of SSA implement methods to make the IP address geo coding of incoming requests.
Sadly, in 2018 is a well established protocol although Int128 is not a well established data type, expecially in database.
So we need to implement a cool workaround to have an efficent query to our database.
Or at least this is the strategy that I used in my three stores, if you have a better idea you can implement your analytic store or even better contribute to the project.
We gonna save every IP address range as a couple of strings.
Algorithm:
- If the IP address is a IPV4 it should be mapped to IPV6 so they can be stored toghether
- Then we gonna take the bytes of our new IP address
- We gonna revert them, so "10.0.0.0" will keep being "10.0.0.0" instead of "10"
- Now we have a string of bytes that rappresent a very big number
- Let's print this number using every digit so they can correctly compared by the database
(from 000000000000000000000000000000000000000 to 340282366920938463463374607431768211455)
Or in code:
private const string StrFormat = "000000000000000000000000000000000000000";
public static string ToFullDecimalString(this IPAddress ip)
{
return (new BigInteger(ip.MapToIPv6().GetAddressBytes().Reverse().ToArray())).ToString(StrFormat);
}
I implemented this function in ServerSideAnalytics.ServerSideExtensions.ToFullDecimalString so if you want to reuse it you don't have to became mad like me.
Now that we have our IP address normalized into a well defined string format, finding the relative country saved in out database is really simple
public async Task<CountryCode> ResolveCountryCodeAsync(IPAddress address)
{
var addressString = address.ToFullDecimalString();
using (var db = GetContext())
{
var found = await db.GeoIpRange.FirstOrDefaultAsync(x => x.From.CompareTo(addressString) <= 0 &&
x.To.CompareTo(addressString) >= 0);
return found?.CountryCode ?? CountryCode.World;
}
}
But to query the database we need to fill it first.
Find a reliable an cheap database of countries and their relative ip address ranges can be quite difficult.
For this reason i wrote other three analytic stores that acts as wrappers around an existing one to provide fallback geo-IP resolution.
If the first repository doesn't contains a valid IP range for the client It will ask to the second one and so on.
If at the end the chain a valid geo-IP has been found this i saved into the main store.
I wrote three of them, if you want to add more, please contribute on GitHub.
You can find those analytic store in ServerSideAnalytics.Extensions
- IpApiAnalyticStore : add ip-geocoding using Ip Api (ip-api.com)
- IpInfoAnalyticStore : add ip-geocoding using Ip Stack (ipinfo.io)
- IpStackAnalyticStore : add ip-geocoding using Ip Stack (ipstack.com)
Personally I'm using a pre-loaded ip range database with all three failover enabled:
public IAnalyticStore GetAnalyticStore()
{
var store = (new MongoAnalyticStore("mongodb://localhost/"))
.UseIpStackFailOver("IpStackAPIKey")
.UseIpApiFailOver()
.UseIpInfoFailOver();
return store;
}
Let's how does It work inside one of those as example
using System;
using System.Collections.Generic;
using System.Net;
using System.Net.Http;
using System.Threading.Tasks;
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;
namespace ServerSideAnalytics.Extensions
{
class IpApiAnalyticStore : IAnalyticStore
{
readonly IAnalyticStore _store;
public IpApiAnalyticStore(IAnalyticStore store)
{
_store = store;
}
public Task<long> CountAsync(DateTime from, DateTime to) => _store.CountAsync(from, to);
public Task<long> CountUniqueIndentitiesAsync(DateTime day) => _store.CountUniqueIndentitiesAsync(day);
public Task<long> CountUniqueIndentitiesAsync(DateTime from, DateTime to) => _store.CountUniqueIndentitiesAsync(from, to);
public Task<IEnumerable<WebRequest>> InTimeRange(DateTime from, DateTime to) => _store.InTimeRange(from, to);
public Task<IEnumerable<IPAddress>> IpAddressesAsync(DateTime day) => _store.IpAddressesAsync(day);
public Task<IEnumerable<IPAddress>> IpAddressesAsync(DateTime from, DateTime to) => _store.IpAddressesAsync(from,to);
public Task PurgeGeoIpAsync() => _store.PurgeGeoIpAsync();
public Task PurgeRequestAsync() => _store.PurgeRequestAsync();
public Task<IEnumerable<WebRequest>> RequestByIdentityAsync(string identity) => _store.RequestByIdentityAsync(identity);
public async Task<CountryCode> ResolveCountryCodeAsync(IPAddress address)
{
try
{
var resolved = await _store.ResolveCountryCodeAsync(address);
if(resolved == CountryCode.World)
{
var ipstr = address.ToString();
var response = await (new HttpClient()).GetStringAsync($"http://ip-api.com/json/{ipstr}");
var obj = JsonConvert.DeserializeObject(response) as JObject;
resolved = (CountryCode)Enum.Parse(typeof(CountryCode), obj["country_code"].ToString());
await _store.StoreGeoIpRangeAsync(address, address, resolved);
return resolved;
}
return resolved;
}
catch (Exception)
{
return CountryCode.World;
}
}
public Task StoreGeoIpRangeAsync(IPAddress from, IPAddress to, CountryCode countryCode)
{
return _store.StoreGeoIpRangeAsync(from, to, countryCode);
}
public Task StoreWebRequestAsync(WebRequest request)
{
return _store.StoreWebRequestAsync(request);
}
}
}
And that's all folks :)