Background
Few months ago, I found an interesting website: http://ipinfodb.com/. It provided API which could "translate" any IP Address into a geography location including City/Region/Country as well as latitude/longitude and time zone information, to invoke its API, a registered API key is required (which is free). Since beforehand I stored visitor's IP Addresses into my own database, I decided to utilize InfoDB
API to store visitor's GEO locations.
Just a few days ago, I casually emitted an idea: summarize those GEO location records and display them on Google Map, hmm, it is feasible:)
So, the process is: Track visitor's IP addresses -> "Translate" them to Geography location -> Show them on Google Map!
(P.S.: I've been used Google Analytics for my Geek Place - http://WayneYe.com for more than two years, it is no double extremely powerful, and it already contains a feature "Map Overlay", however, due to privacy policy, Google Analytics does NOT display visitor's IP address, see http://www.google.com/support/analytics/bin/answer.py?hl=en&answer=86214).
Implementation
The first task I need to do is track visitor's IP Address, most of the time, user visits a website in browser submits an HTTP GET
request (an HTTP data package) based on Transmission Control Protocol (most of the time), browser passed the ball to DNS server and DNS server delivered the request to the designation - the web host server, during the process, the original Http request was possibly transferred through a number of routers/proxies and many other stuff, the request's header information might have been updated: Via
(Standard HTTP request header) or X-Forwarded-For
(non-standard header but widely used), could be the original ISP's information/IP Address OR possibly one of the proxy's IP Address.
So, usually the server received the request and saw Via/X-Forwarded-For header information, it got to know visitor's IP address (NOT all the time, sometimes ISP's IP address), in ASP.NET, it is simply to call Request.UserHostAddress
, however, we can never simply trust this because of two major reasons:
- Malicious application can forge HTTP request with modified
X-Forwareded-To
header (for example: X-Forwarded-To
: dangerous code
), if you are unlucky to trust it and have it inserted into database, then SQL Injection hole will be utilized by malicious application. - Not all visitors are human-been, part of them could be search engine spiders, I must distinguish human visitors and spiders, otherwise for example, I will be happy to see a lot of "visitors" came from "Mountain View, CA" ^_^.
For #1: I use regular expression to validate the string
I got from Request.UserHostAddress
:
public static Boolean IsValidIP(string ip)
{
if (System.Text.RegularExpressions.Regex.IsMatch
(ip, "[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}"))
{
string[] ips = ip.Split('.');
if (ips.Length == 4 || ips.Length == 6)
{
if (System.Int32.Parse(ips[0]) < 256 && System.Int32.Parse(ips[1]) < 256
& System.Int32.Parse(ips[2]) < 256 & System.Int32.Parse(ips[3]) < 256)
return true;
else
return false;
}
else
return false;
}
else
return false;
}
If the result is "0.0.0.0
", I will ignore it.
For #2, so far I haven't found a "perfect way" to solve this issue (and I guess there might be no perfect solution to identify all the search engines in the world, please correct me if I am wrong); However, I've defined two rules to try my best to identify them for general and normal situations:
Rule #1
Request which contains "Cookie" Header with "ASP.NET_SessionId" AND its value is equal with server side, then it should be a normal user who has just visited my website within the one session.
Notes: There might be two exceptions for rule #1.
- If user's browser has disabled Cookie, then this rule will NOT be effective since the client request will never contain a Cookie header since the browser disabled it :).
- Assume there is a crawler who crawls my website and accept storing cookie, then #1 will not be effective. However, I don't think a crawler will firstly request a
SessionID
and then request again with the SessionID
).
Rule #2
Define a crawler list and analyse whether "User-Agent" header contains one of them, this should be configurable. Refer to more Crawler examples at http://en.wikipedia.org/wiki/Web_crawler#Examples_of_Web_crawlers.
Talk is cheap, show me the code, I wrote a method to identify crawlers by applying two rules above.
public static Boolean IsCrawlerRequest()
{
if (HttpContext.Current.Request.Headers["Cookie"] != null
&& HttpContext.Current.Request.Headers
["Cookie"].Contains("ASP.NET_SessionId"))
return false;
var crawlerList = new String[] { "google", "bing",
"msn", "yahoo", "baidu",
"sosospider", "sogou", "youdao" };
if (!String.IsNullOrEmpty(HttpContext.Current.Request.UserAgent))
foreach (String bot in crawlerList)
if (HttpContext.Current.Request.UserAgent.ToLower
(CultureInfo.InvariantCulture).Contains(bot))
return true;
return false;
}
Please be aware that I commented out HttpContext.Current.Request.Cookies["ASP.NET_SessionId"] != null
, since I found that Request.Cookie will ALWAYS contain "ASP.NET_SessionId" EVENT IF the browser disabled Cookie storing, I will do further investigation and double check later!
Ok, now we get normal users' IP Addresses and filtered search engine crawlers, the next step is invoking InfoDB
API to "translate" IP Address to Geolocation, you need register an API KEY here, and then submit an HTTP GET
request to:
http://api.ipinfodb.com/v2/ip_query.php?key=[API KEY]&ip=[IP Address]&timezone=false
It returns XML below, I take IP="117.136.8.14
" for example:
="1.0"="UTF-8"
<Response>
<Status>OK</Status>
<CountryCode>CN</CountryCode>
<CountryName>China</CountryName>
<RegionCode>23</RegionCode>
<RegionName>Shanghai</RegionName>
<City>Shanghai</City>
<ZipPostalCode></ZipPostalCode>
<Latitude>31.005</Latitude>
<Longitude>121.409</Longitude>
<Timezone>0</Timezone>
<Gmtoffset>0</Gmtoffset>
<Dstoffset>0</Dstoffset>
<TimezoneName></TimezoneName>
<Isdst></Isdst>
<Ip>117.136.8.14</Ip>
</Response>
Wow, looks precise:), I am going to show visitor's geolocation on Google Map (I know this compromises visitor's privacy, but my personal blog http://WayneYe.com is not a company and I will NEVER earn a cent by doing this:)).
Anyway, I use the latest Google Map JavaScript API V3, and there are two major functionalities:
- Display visitor's Geolocation as long as user's browser support "navigator.geolocation" property (Google Chrome, Mozilla Filefox support it, Internet Explorer does not support it and I will set default location to New York City), a sample below:
- Display a specified blog's visitors' geolocations on Google Map, screenshot below shows the visitors' geolocations who visited my blog: My new Dev box - HP Z800 Workstation, by clicking each geolocation, it will show on Google Map.
The JavaScript code is shown below:
<script type="text/javascript">
var initialLocation;
var newyork = new google.maps.LatLng(40.69847032728747, -73.9514422416687);
var browserSupportFlag = new Boolean();
var map;
var myOptions
var infowindow = new google.maps.InfoWindow();
function initialize() {
myOptions = {
zoom: 6,
mapTypeId: google.maps.MapTypeId.ROADMAP
};
map = new google.maps.Map(document.getElementById("googleMapContainer"), myOptions);
if (navigator.geolocation) {
browserSupportFlag = true;
navigator.geolocation.getCurrentPosition(function (position) {
map.setCenter(new google.maps.LatLng
(position.coords.latitude, position.coords.longitude));
infowindow.setContent
('Hi, dear WayneYe.com visitor! You are here:)');
infowindow.setPosition(new google.maps.LatLng
(position.coords.latitude, position.coords.longitude));
infowindow.open(map);
}, function () {
handleNoGeolocation(browserSupportFlag);
});
} else {
browserSupportFlag = false;
handleNoGeolocation(browserSupportFlag);
}
function handleNoGeolocation(errorFlag) {
map.setCenter(newyork);
infowindow.open(map);
}
}
function setGoogleMapLocation(geoLocation, latitude, longitude) {
contentString = geoLocation;
var visitorLocation = new google.maps.LatLng(latitude, longitude);
map.setCenter(visitorLocation);
infowindow.setContent(contentString);
infowindow.setPosition(visitorLocation);
infowindow.open(map);
}
</script>
Ok, all done, eventually I built a visit record page which shows every's blog's visitors' Geolocations. The location is http://wayneye.com/VisitRecord.