Introduction
I wanted to keep trace of visitors and know the usual stuff of web analytics: visitors, source, nationality, behaviour and so on.
And client side analytics are not so reliable:
- Ad Blockers interfere with them
- Using a third party service requires to annoy the user with those huge cookie consent banners
- They drastically increase the loading time of the web application
- They don't register API calls and any other not-HTML calls like web API
So I developed by myself a very simple server side analytics system for .NET Core, which is running on my website.
The Middleware
The idea is to implement a middleware that will be invoked on every request, no matter if a route was specified or not.
This middleware will be put into the task pipeline and set up using only fluid methods.
The middleware will write incoming request into a generic store after the processing of the request is completed.
The middleware will be inserted into the task pipeline by using UserServerSideAnalytics
extension method in app startup.
This method requires an IAnalyticStore
interface that is going to be the place where our received request will be stored.
public void Configure(IApplicationBuilder app)
{
app.UseServerSideAnalytics(new MongoAnalyticStore("mongodb://192.168.0.11/matteo"));
}
Inside the extension, I will create a FluidAnalyticBuilder
and bind it to the task pipeline via the method Use
.
public static FluidAnalyticBuilder UseServerSideAnalytics
(this IApplicationBuilder app,IAnalyticStore repository)
{
var builder = new FluidAnalyticBuilder(repository);
app.Use(builder.Run);
return builder;
}
The FluidAnalyticBuilder
is a fluid class that will handle the configuration of the analytics that we want to collect (like filtering unwanted URL, IP address and so on) and practically implement the core of the system via the method Run
.
In this method, ServerSideAnalytics
will use two methods of the store:
ResolveCountryCodeAsync
: Retrieve (if existing) the country code of remote IP address.
If not existing, CountryCode.World
is expected. StoreWebRequestAsync
: Store the received request into the database
internal async Task Run(HttpContext context, Func<Task> next)
{
await next.Invoke();
if (_exclude?.Any(x => x(context)) ?? false)
{
return;
}
var req = new WebRequest
{
Timestamp = DateTime.Now,
Identity = context.UserIdentity(),
RemoteIpAddress = context.Connection.RemoteIpAddress,
Method = context.Request.Method,
UserAgent = context.Request.Headers["User-Agent"],
Path = context.Request.Path.Value,
IsWebSocket = context.WebSockets.IsWebSocketRequest,
CountryCode = await _store.ResolveCountryCodeAsync(context.Connection.RemoteIpAddress)
};
await _store.StoreWebRequestAsync(req);
}
(Maybe, I should add other fields to collected requests? Let me know. 😊)
Via the List<Func<HttpContext, bool>> _exclude
, it also provides easy methods to filter out requests that we don't care about.
public void Configure(IApplicationBuilder app, IHostingEnvironment env)
{
app.UseDeveloperExceptionPage();
app.UseBrowserLink();
app.UseDatabaseErrorPage();
app.UseAuthentication();
app.UseServerSideAnalytics(new MongoAnalyticStore("mongodb://localhost/matteo"))
.ExcludePath("/js", "/lib", "/css")
.ExcludeExtension(".jpg", ".ico", "robots.txt", "sitemap.xml")
.Exclude(x => x.UserIdentity() == "matteo")
.ExcludeIp(IPAddress.Parse("192.168.0.1"))
.ExcludeLoopBack();
app.UseStaticFiles();
}
And that is all the middleware. 😀
The Store
Have you seen above that the middleware writes collected data into a generic store expressed by the interface IAnalyticStore
, the component that will handle all the dirty work of this job.
I wrote three stores:
In the attached code, you will find a sample site using SQLite, so no external process is needed to run the example.
The store has to implement an interface with two methods invoked by Server Side Analytics and some method to query stored requests.
This is because database types isolation is so cool but also means that you cannot cast an Expression<Func<MyType,bool>>
to Expression<Func<WebRequest,bool>>
, no matter how similar MyType
and WebRequest
would be.
We will see the use of those methods in the last part of the article regarding the exposition of our data inside the web application.
public interface IAnalyticStore
{
Task StoreWebRequestAsync(WebRequest request);
Task<long> CountUniqueIndentitiesAsync(DateTime day);
Task<long> CountUniqueIndentitiesAsync(DateTime from, DateTime to);
Task<long> CountAsync(DateTime from, DateTime to);
Task<IEnumerable<IPAddress>> IpAddressesAsync(DateTime day);
Task<IEnumerable<IPAddress>> IpAddressesAsync(DateTime from, DateTime to);
Task<IEnumerable<WebRequest>> InTimeRange(DateTime from, DateTime to);
Task<IEnumerable<WebRequest>> RequestByIdentityAsync(string identity);
Task StoreGeoIpRangeAsync(IPAddress from, IPAddress to, CountryCode countryCode);
Task<CountryCode> ResolveCountryCodeAsync(IPAddress address);
Task PurgeRequestAsync();
Task PurgeGeoIpAsync();
}
Identities
Have you maybe noticed every WebRequest
has got a field name Identity
. This is because the most important data is to know Who made What.
But how is it evaluated?
- If it is from a registered user, we are going to use username
- If not, we are going to use the default AspNetCore cookie
- If not available, we use the connection id of the current context
- Then we are going to try to save the result in our own cookie, so we don't have to do it again
In code:
public static string UserIdentity(this HttpContext context)
{
var user = context.User?.Identity?.Name;
const string identityString = "identity";
string identity;
if (!context.Request.Cookies.ContainsKey(identityString))
{
if (string.IsNullOrWhiteSpace(user))
{
identity = context.Request.Cookies.ContainsKey("ai_user")
? context.Request.Cookies["ai_user"]
: context.Connection.Id;
}
else
{
identity = user;
}
context.Response.Cookies.Append("identity", identity);
}
else
{
identity = context.Request.Cookies[identityString];
}
return identity;
}
IP Geocoding
One of the most interesting data of every analytic system is to know where your user comes from.
So the IAnalyticStore
of SSA implement methods to make the IP address geo coding of incoming requests.
Sadly, in 2018, there is a well established protocol although Int128 is not a well established data type, especially in database.
So we need to implement a cool workaround to have an efficient query to our database.
Or at least this is the strategy that I used in my three stores, if you have a better idea you can implement your analytic store or even better contribute to the project.
We are going to save every IP address range as a couple of string
s.
Algorithm:
- If the IP address is a IPV4, it should be mapped to IPV6 so they can be stored together
- Then we are going to take the bytes of our new IP address
- We are going to revert them, so "
10.0.0.0
" will keep being "10.0.0.0
" instead of "10
" - Now we have a string of bytes that represent a very big number
- Let's print this number using every digit so they can correctly compared by the database
(from 000000000000000000000000000000000000000
to 340282366920938463463374607431768211455
)
Or in code:
private const string StrFormat = "000000000000000000000000000000000000000";
public static string ToFullDecimalString(this IPAddress ip)
{
return (new BigInteger(ip.MapToIPv6().GetAddressBytes().Reverse().ToArray())).ToString(StrFormat);
}
I implemented this function in ServerSideAnalytics.ServerSideExtensions.ToFullDecimalString
so if you want to reuse it, you don't have to become mad like me.
Now that we have our IP address normalized into a well defined string
format, finding the relative country saved in our database is really simple.
public async Task<CountryCode> ResolveCountryCodeAsync(IPAddress address)
{
var addressString = address.ToFullDecimalString();
using (var db = GetContext())
{
var found = await db.GeoIpRange.FirstOrDefaultAsync
(x => x.From.CompareTo(addressString) <= 0 &&
x.To.CompareTo(addressString) >= 0);
return found?.CountryCode ?? CountryCode.World;
}
}
But to query the database, we need to fill it first.
Find a reliable and cheap database of countries and their relative IP address ranges can be quite difficult.
For this reason, I wrote other three analytic stores that act as wrappers around an existing one to provide fallback geo-IP resolution.
If the first repository doesn't contain a valid IP range for the client, it will ask the second one and so on.
If at the end chain a valid geo-IP has been found this, I saved into the main store.
I wrote three of them, if you want to add more, please contribute on GitHub.
You can find those analytic store in ServerSideAnalytics.Extensions.
IpApiAnalyticStore
: Add ip-geocoding using Ip Api (ip-api.com) IpInfoAnalyticStore
: Add ip-geocoding using Ip Stack (ipinfo.io) IpStackAnalyticStore
: Add ip-geocoding using Ip Stack (ipstack.com)
Personally, I'm using a pre-loaded IP range database with all three failovers enabled:
public IAnalyticStore GetAnalyticStore()
{
var store = (new MongoAnalyticStore("mongodb://localhost/"))
.UseIpStackFailOver("IpStackAPIKey")
.UseIpApiFailOver()
.UseIpInfoFailOver();
return store;
}
Let's see how it works inside one of those as example:
using System;
using System.Collections.Generic;
using System.Net;
using System.Net.Http;
using System.Threading.Tasks;
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;
namespace ServerSideAnalytics.Extensions
{
class IpApiAnalyticStore : IAnalyticStore
{
readonly IAnalyticStore _store;
public IpApiAnalyticStore(IAnalyticStore store)
{
_store = store;
}
public Task<long> CountAsync(DateTime from, DateTime to) => _store.CountAsync(from, to);
public Task<long> CountUniqueIndentitiesAsync(DateTime day) =>
_store.CountUniqueIndentitiesAsync(day);
public Task<long> CountUniqueIndentitiesAsync(DateTime from, DateTime to) =>
_store.CountUniqueIndentitiesAsync(from, to);
public Task<IEnumerable<WebRequest>> InTimeRange(DateTime from, DateTime to) =>
_store.InTimeRange(from, to);
public Task<IEnumerable<IPAddress>> IpAddressesAsync(DateTime day) =>
_store.IpAddressesAsync(day);
public Task<IEnumerable<IPAddress>> IpAddressesAsync(DateTime from, DateTime to) =>
_store.IpAddressesAsync(from,to);
public Task PurgeGeoIpAsync() => _store.PurgeGeoIpAsync();
public Task PurgeRequestAsync() => _store.PurgeRequestAsync();
public Task<IEnumerable<WebRequest>> RequestByIdentityAsync(string identity) =>
_store.RequestByIdentityAsync(identity);
public async Task<CountryCode> ResolveCountryCodeAsync(IPAddress address)
{
try
{
var resolved = await _store.ResolveCountryCodeAsync(address);
if(resolved == CountryCode.World)
{
var ipstr = address.ToString();
var response = await (new HttpClient()).GetStringAsync
($"http://ip-api.com/json/{ipstr}");
var obj = JsonConvert.DeserializeObject(response) as JObject;
resolved = (CountryCode)Enum.Parse(typeof(CountryCode),
obj["country_code"].ToString());
await _store.StoreGeoIpRangeAsync(address, address, resolved);
return resolved;
}
return resolved;
}
catch (Exception)
{
return CountryCode.World;
}
}
public Task StoreGeoIpRangeAsync(IPAddress from, IPAddress to, CountryCode countryCode)
{
return _store.StoreGeoIpRangeAsync(from, to, countryCode);
}
public Task StoreWebRequestAsync(WebRequest request)
{
return _store.StoreWebRequestAsync(request);
}
}
}
And that's all, folks! :)
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.