Introduction
Google SiteMaps are an important tool for website developers, webmasters, and pretty much anyone with a website. If you don't know what a Google Sitemap is, take a look at http://www.google.com/webmasters. A Google Sitemap is an XML file which instructs the Google crawler which URLs in your site to visit, and allows you to tell the crawler how often pages are updated, and you can also place a relative priority on the pages within the site. The instructions Google gives is to code the XML file by hand, but of course, with a dynamic website, you don't want to do that.
What you need is a dynamic Google Sitemap generator that gives real-time data about your website. There are many of these around. My specific problem was that I had specific websites with specific requirements, but 90% of the requirements were exactly the same. So, I decided to utilise the Provider model, and develop a base provider for delivering Sitemaps, which could then be expanded with new providers for each specific situation that arose in subsequent websites. As I hadn't done any work with either Google Sitemaps or Providers, it provided a good learning opportunity as well.
What does a Google Sitemap look like?
A Google Sitemap is just an XML file telling the Google crawler which URLs to look up in the website.
For each URL in the site to be indexed, there should be one <url>
entry in the XML file. There is a limit of 10,000 URLs per file, but several sitemaps per site can be submitted. The lastMod
attribute tells Google when the page was last modified, the changefreq
attribute tells Google how often the page changes, and the priority is a relative measure for the page, against all other pages in your site. As Google clearly explains, there isn't anything you can do with Sitemaps to increase your site ranking, they are only a tool for helping Google crawl all the parts of your site that you want them to crawl. In this way, it's kind of a super-robots.txt file.
Background
I have worked with the ASP Provider model in the past, but I had never actually developed my own custom provider for anything. I utilised the resources of MSDN and downloaded the source code for a Custom Provider. I even read the instructions!
The Provider model allows the plugging in of different code to do the same job without having to recompile. Changing the provider for a specific task in an application is done in the web/app config file. The config file tells the application at run-time which code to execute for a specific task.
The most common use of Providers is in the area of Data providers - an example allowing the quick change between a SQL Server provider and an Oracle provider by switching between providers in the web/app.config file.
The relevant entry in the web.config file for the Google Site Map Provider looks like this:
<googlesitemaps defaultProvider="BaseGoogleSiteMapProvider">
<providers>
<add name="BaseGoogleSiteMapProvider"
type="GoogleSiteMap.GoogleSiteMapProvider" />
<add name="SpecialisedGoogleSiteMapProvider"
type="Specialised.GoogleSiteMapProvider" />
</providers>
</googlesitemaps>
This example shows the use of two providers, the default provider, and, if necessary, a Specialised Provider. (The Specialised.GoogleSiteMapProvider
isn't in the demo project, I have just shown it as an example here.)
Requirements
The requirements I generated for my own project were to:
- Be instantly useable with the majority of ASP.NET applications.
- Be a full 'binary' solution - no integration of code or compiling - just drop in a binary, modify the web.config, and go.
- Be extendable so that more complicated ASP.NET applications could redefine the provider without restriction.
Solution
The solution was to have a single assembly with three main types:
- An HTTP Handler which would return the XML on request (called
GoogleSiteMapHandler
)
- A Provider type (called
GoogleSiteMapProvider
)
- A Controller class to glue the Handler and Provider together
Why do it this way?
In effect, I could have had a separate Handler file (.ashx) which could be dropped into the destination ASP.NET website. But to keep to requirement (1), I wanted the whole project to be a simple drop-in to the \bin directory. This is why the Handler and the Provider are in the same assembly.
By doing it this way, I can also create new assemblies which inherit from the base provider, controller, and handler classes and create whole new Providers for specific types of websites which use HTTP redirection and URLs that don't actually map to physical files on the server.
Using the code
To install and try out the demo project, simply download the zip file and unpack it. The file 'iFinity.GoogleSiteMapProvider.dll' should be copied into the \bin directory of your target website.
Then, open up your web.config (remember to take a backup first) and insert the following lines:
In the <configuration>
section, under <configSections>
, put in the following entries:
<configuration>
<configSections>
<section name="googlesitemaps"
type="iFinity.Providers.GoogleSiteMap.GoogleSiteMapSection,
iFinity.GoogleSiteMapProvider />
</configSections>
</configuration>
Remember you will probably already have the <configuration>
and <configSections>
entries in the web.config, but create them if you do not.
The entry in the <configSections>
tells ASP.NET to look for a section in the app/web.config file called 'googlesitemaps
'. The type
attribute is in the format of type="typeName, assemblyName", and tells ASP.NET that there is a type called 'GoogleSiteMapSection
' in the assembly 'iFinity.GoogleSiteMapProvider'. The GoogleSiteMapSection
type derives from System.Configuration.ConfigurationSection
and provides the run-time type to represent the Providers section in the config file. This is all done at runtime by the ProviderBase
class.
The next entry to make in the web.config file is the actual 'googlesitemaps
' section that was named in the <configSection>
entry. This should be done after the closing tag of the <system.web>
section, but before the end of the </config>
section.
<googlesitemaps defaultProvider="BaseGoogleSiteMapProvider">
<providers>
<add name="BaseGoogleSiteMapProvider"
type="iFinity.Providers.GoogleSiteMap.GoogleSiteMapProvider"
defaultPagePriority="0.5" defaultPageUpdateFrequency="daily"
sitePageTypes="aspx,html,htm" />
</providers>
</googlesitemaps>
This entry tells ASP.NET which providers are available to use at runtime. If anything else but the default provider is to be used, the calling code would have to be modified to do so. However, to change the default provider to be used, the defaultProvider
attribute just needs to match the name of a provider in the list.
The final change to make to the web.config is the addition of the HTTP Handler to actually produce the Sitemap. This is done in the web.config within the system.web
section, under the httpHandlers
section.
<httpHandlers>
<add verb="*" path="GoogleSiteMapHandler.axd"
type="iFinity.Providers.GoogleSiteMap.GoogleSiteMapHandler,
iFinity.GoogleSiteMapProvider"/>
</httpHandlers>
This entry tells any incoming requests for 'GoogleSiteMapHandler.axd' to load up the iFinity.GoogleSiteMapProvider assembly and call the type of 'iFinity.Providers.GoogleSiteMap.GoogleSiteMapHandler
'. This is done automatically by ASP.NET for you, as long as the specified type implements the IHttpHandler
interface (which this does).
Please note that the Handler doesn't need to be in the provider, and in a way, including the Handler type within the Provider model pollutes it slightly. By rights, the Handler should call the ASP.NET ProvidersHelper
namespace to give it back the correct Provider for that configuration. To be completely correct, the Handler type and the GoogleSiteMapService
type should be in a separate namespace and assembly. But as I intend to create separate assemblies for providers down the track, I'm happy to live with my model. Others may claim it incorrect, and they have a valid point.
Program flow
When an HTTP request is made for GoogleSiteMapHandler.axd (either by the Google crawler, or by typing in 'yoursite.com/googleSiteMapHandler.axd' into a browser), ASP.NET loads up the named type/assemby in the httpHandlers
web.config section. In this instance it is the same DLL as the Provider, though it doesn't need to be as discussed previously. ASP.NET calls ProcessRequest(HttpContext context)
as any type implementing IHttpHandler
must have. This then calls the GoogleSiteMapService.GetGoogleSiteMap()
method, which then asks ASP.NET for the default provider as named in the googlesitemaps
configuration section.
ASP.NET reads in the providers, and instantiates an object of the type named as the default provider. This provider object is then asked for the XML that makes up the site map. As the assembly also includes a basic implementation of the default provider, it is this provider that is called. The base implementation in the demo project simply iterates the directories and reads in all of the files that match the named extension in the sitePageTypes
attribute. This XML is then passed back up through the call stack and returned as XML through the HTTPHandler
, resulting in XML being output either to the browser or the Google crawler.
Expansion possibilities
As mentioned before, this project was made with the intent of developing a better understanding of the provider model, and providing a base implementation that can be expanded to better handle more complicated ASP.NET application models.
To expand this code, there are two possible directions. The first, and simplest, is to just modify the code in the GoogleSiteMapProvider IteratePages()
procedure. This can be modified in order to better provide a site map for a particular site - the possibilities are quite open in this respect.
The second, and conceptually better but slightly more complicated, is to simply reference the provided assembly and create your own provider by inheriting from the GoogleSiteMapProvider
type. You will need to redefine the IteratePages()
in the derived class to index the pages in the site in a better method, but everything else can be left as is. The new provider would be compiled into a separate assembly and then named as the default provider in the googlesitemaps
configuration section.
For instance, let's say you create a new provider class called 'MyNewGoogleSiteMapProviderType
' and compile it into an assembly called 'MyNewGoogleSiteMapProviderAssembly.dll'. The config entry would be:
<add name="BaseGoogleSiteMapProvider" type="MyNewGoogleSiteMapProviderType,
MyNewGoogleSiteMapProviderAssembly"
defaultPagePriority="0.5" defaultPageUpdateFrequency="daily"/>
This would mean that your new type would be called to provide the list of pages for the website. The Base provider would take care of formatting it into the Sitemap format and outputting the XML. You can leave all the other web.config entries as is - the built in HttpHandler would take care of calling your provider for the list of pages in your site. How you provide that list is up to you!
What's next
I will be developing a new implementation of the provider model to suit DotNetNuke, as this is the platform I do a lot of development in. DotNetNuke uses an HttpRedirection
method to serve many URLs from a single default.aspx page, and as such can't be used to generate a Sitemap from physical files.
I will then create different providers for each of the separate specialised modules that I use in DotNetNuke websites. Some modules provide a wide range of different content for one URL, depending on database-driven content. With conventional Google indexing, much of the content may not be found and indexed correctly.
Please note that the XML examples in this page have had page breaks placed in them to get them to fit, there is no need to do this in your web.config file.
Copyright notice
You are free to use, modify, and extend the supplied code provided that you do not remove the copyright messages in the source, or attempt to pass either the code or this article off as your own. Obviously with free demo code, there's no warranty that it will actually work and there may be bugs in the provided download.
If you use this code and find it useful, I appreciate links back to my website, http://www.ifinity.com.au/.