Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

An extensible DotNetNuke Google Sitemap Generator

0.00/5 (No votes)
30 Apr 2008 3  
Building a DotNetNuke Sitemap Generator using the ASP.NET Provider model.

Introduction

In a previous article, I wrote about a Google Sitemap generator using the ASP.NET Provider model. Well, that whole basis was a learning exercise based on me wanting to sharpen up my knowledge about the intricacies of the Provider model. I had only ever worked with existing code, and hadn't written my own providers from 'File->New Project' if you get my drift. This new entry takes that idea one step further and implements it for DotNetNuke based websites.

It was all a lead-in to the 'real' project, which was developing a DotNetNuke-specific Google Sitemap generator using the Provider model. Sure, these are around, I myself have downloaded and used the DotNetNuke Google Site Map developed by bitethebullet.co.uk - the person behind it seems to want to remain anonymous, but good work there. However, I found it slightly limited in its use, particularly when you start adding in modules where the number of URLs for a single page (tab) starts to grow. The standard DNN Blog module is such an example - for a single page, there can be many different URLs - one for each Blog and Blog Entry. Then, you've got the date-based Archive URLs. A more sophisticated, module specific Google Sitemap tool was (is) needed.

My Requirements

I develop website based on DotNetNuke, and as such, most websites have a common core which is pretty much the same - a series of pages with HTML modules on them. Specific websites have specific functionality, such as e-commerce modules, enquiry modules, and of course, blog modules. So, to get a Google Sitemap generator that would cater for all the different (read: complicated) module types, I realised I needed a flexible model. Enter the Provider model.

The requirements were:

  1. Base SiteMap generator which would index 'normal' DotNetNuke pages, and obey rules to do with 'hidden' pages and pages available only to registered users.
  2. Extensible model so that run-time configuration using web.config could be achieved for more complex modules.
  3. Everything to be accomplished using assemblies with no .aspx, .ashx, or any other type of ASP.NET page. Everything to be dropped into the \bin directory for deployment.

My Design

Using my original Google Sitemap provider model as a starting point, I added in DNN-specific code. The original prototype for an ordinary ASP.NET site simply iterated all of the files on the web server and built up a sitemap based on physical files. As DotNetNuke uses a single page (default.aspx) and determines the content based on the request URL, the new site design relied on reading the DNN Tabs collection for a specific portal, and building up the Sitemap this way.

Each page has to be checked for the security level - pages that are not visible to the public are not put into the sitemap, and hidden pages are shown/hidden to the sitemap based on a switch set in the web.config file.

This works quite well and generates a successful sitemap for any 'standard' DNN site (version 4 upwards). This takes care of my requirement number 1.

As in my earlier prototype design, the ASP.NET Handler was built into the provider DLL so that the handler and the provider are in the same assembly. There are reasons for this not being conceptually correct, but I chose binary simplicity over true separation of components. This takes care of my requirement number 2.

Extending the Model

As in my requirement number 2, a flexible and extensible model is required to cater for more complex modules sitting on DotNetNuke pages. This is where the ASP.NET Provider model comes in. I ensured that the base Provider class was designed to be derived by setting the class accessibility modifiers, and created a new assembly called BlogGoogleSiteMap. The main type in this assembly, BlogGoogleSiteMapProvider, inherits from the original GoogleSiteMapProvider type. This gives it the base functionality for generating a Sitemap, transforming the starting URL into a DotNetNuke PortalAlias instance and other assorted functions. However, by redefining the SitePages(siteURL) method (which returns an object collection of the logical page URLs for the given actual URL), the new BlogGoogleSiteMap provider works on the specific blog nuances.

Stitching it together is done in a procedure in the base DNN Google Site Map Provider type. This method reads each of the modules on a specific DNN page (actually, an instance of a TabInfo object). For each of these modules, the ModuleDef type instance is loaded and the FriendlyName of the module definition is read. This gives us a unique indicator of what modules are on the page - in effect, telling what the content of the page is. You could implement a switch or Case statement here, and call a specific piece of code based on the module definition. But, that would violate requirement number 2 - an extensible model. Each specific module definition means a new Module-specific assembly to reference, and the whole SiteMap Provider would lose portability. You'd have to upload every binary for every module you've ever programmed into the Google Sitemap provider.

Instead, the design uses a simple naming format of ModuleDef.FriendlyName + ".GoogleSiteMapProvider" to locate the correct provider for the specific module in the web.config.

For instance, when searching for a blog entry in a collection of modules on a DNN page, you'll come across the ModuleDefinition friendly name of 'View_Blog'. To define a Google Sitemap Provider for this module definition, all that is required is an entry in the <googlesitemaps> web.config entry:

<add name="View_Blog.GoogleSiteMapProvider" 
  type="iFinity.DNN.Modules.GoogleSiteMap.BlogGooogleSiteMapProvider, 
  iFinity.DNN.BlogGoogleSiteMapProvider" />

The base DNNGoogleSiteMap Provider has this FriendlyName => correct Google Sitemap Provider code built into it. Each module on each page is checked for an entry in the <googlesitemaps> section. When it finds a matching module, it will load the named Provider and call it to get the list of URLs for the specific page/module combination. If it doesn't find a module entry in the config section, it just returns the 'normal' DNN page URL for the specific page without attempting to load any module-specific Provider.

Provider-per-module

With this framework in place, any specific module in DNN that uses more than just the standard page URL can have a specific GoogleSiteMap Provider developed for it. This Provider can then just be dropped into the specific website that uses that module. So, if you are creating a lot of different DNN sites, all using a different mixture of modules, you can quickly configure up the required Google SiteMap for the specific website, just by dropping in different assemblies into the \bin directory, and by modifying the web.config file. No recompiles or modifications to the DNN base are needed.

The DotNetNuke Blog Google Sitemap Provider

The specific Blog Google Sitemap Provider works in a specific way. Firstly, by reverse engineering the Blog code and testing with some examples, I figured out that there really is only one 'Blog Set' per portal. You can put blogs and blog entries across specific pages on the site, and you can associate specific blogs with specific pages, but the relationship is Portal->Blogs->Entries rather than Portal->Page->Blogs->Entries, as you might expect on first glance. There isn't really a way of associating a specific blog with a specific page, as all blogs can be viewed on all blog-specific pages through the link navigation that comes standard in the blog module. I can understand why the designers did it this way, as it gives complete flexibility for a visitor to browse the entire set of blogs/entries on a site without hunting around for them.

With this in mind, each page can therefore have a full set of Blog-related URLs associated with it. Depending on the number of blogs, this can quickly build to a high number. But, most Blog installations I have seen tend to stick the entire blog-set on a single page in the site and leave it at that.

The Blog Sitemap Provider iterates through each Blog and Blog entry, and puts in an entry for each specific URL in the Blog. The site of URLs for a specific page might be:

//the standard blog page
http://www.yoursite.com/blog/tabid/15/default.aspx

//the standard page for BlogID = 1
http://www.yoursite.com/blog/tabid/15/blogid/1/default.aspx

//the specific URL for EntryID = 2 (entries are unique across all blogs)
http://www.yoursite.com/blog/tabid/15/entryid/2/default.aspx

//the specific URL for BlogID = 1, EntryID = 2
http://www.yoursite.com/blog/tabid/15/blogid/1/entryid/2/defaut.aspx

The last URL (blogID, entryID) will produce an identical page as the third in the list (EntryID only) because each EntryID is unique across the Portal, regardless of which actual Blog it belongs to. This means that the two URLs provide an identical page, and following Google's guidelines about identical content, the last URL doesn't get submitted by the Blog Google Sitemap Provider.

The Blog Google Sitemap Provider also has a configurable web.config entry in the Provider entry which specifies whether or not to include the Blog archive. By setting this to true, it will include links to the archive of blogs. Now, this may or may not be identical to the individual Entry URLs, depending on whether or not it is the custom for the site to only have a maximum of one entry per day. It is a judgment call by the website owner whether including the archives into the Google Sitemap is necessary or not. Archives have a specific URL pattern - and for some reason, this always reverts to parameter driven (non-friendly??) URLs, such as:

http://www.yoursite.com/default.aspx?tabid=15&BlogDate=2006-10-11

The way, it actually works that the Blog page will show all entries in the month up until the date specified. So, submitting a date of 11-Oct-2006 will return all blog entries from 1st October, 2006 to 11th October, 2006. Again, whether or not this produces a unique page depends on the number of entries. A blog with one entry a month will produce roughly the same page content, but a blog with one or two entries a week will provide enough pages to be bothered with including the archives. Remember that there will be a distinct URL for each and every blog entry. There is a limit on the Google Sitemap file of up to 10,000 URLs, but if you've done 10,000 blog entriesb perhaps a career in writing awaits you instead of configuring Google Sitemaps.

The page update frequency and page priority elements in the standard Google schema are optional, and there is a school of thought that says 'don't provide any information, it can only get you into trouble'. I don't agree with this, and obviously Google wants to know how often your pages gets updated. Some might want to say 'every day' and 'priority = 1' thinking this will get them more frequent Googlebot visits, and somehow, higher page rankings. Google couldn't be clearer on this point, and in their Sitemap Help section, state that these are hints only, and it is up to the Googlebot if it will follow the hints or not. I figure there is no point telling the Googlebot that a page is updated daily, when in reality it never changes. It wouldn't take much of a smart programmer to compare the cache last time with the current version of a page and determine that no content is different.

With this in mind, in the Blog Provider, I have developed a simple algorithm to compare the time between entries, and to supply a rough estimate on page update frequency depending on how often the page is getting updated. For the Blog pages, this is how many new entries are going in. For the Entry pages, this really is dependant on how many new comments are being added to your blog entry. If nobody comments on the page (or you have comments turned off), then that entry, once posted, is probably never going to change. Accordingly, it will be shown as PageUpdateFrequency=Never in the associated Sitemap.

The page priority is a relative term - and by filling this out, you are ranking pages within your own site on importance. Given this, the Blog Provider rates the newest blog pages as whatever is set as the defaultPagePriority in the web.config. However, it halves this value for the Blog Archives, as you would expect the archived pages to be less relevant than the newer postings. I'd like to do a long-term study on Sitemaps and web logs to see if changing the page update frequency actually changes the way that the Googlebot accesses the pages in a site, but that will have to go into the 'one day' pile of projects-to-do. Actually, given the database-centric DotNetNuke site logs, it's probably not that hard, and would yield interesting data when studied over a significant period. Back to the topic...

Installing and Configuring the Example DNN and Blog Google Sitemap Providers

If you've downloaded the code and wish to install it on your site, first place all the of the DLLs into your site's \bin directory. This includes the Utility DLLs and other associated items in the download. Then, open your web.config and make the following modifications (of course, I don't need to tell you to backup your web.config first, do I??).

In the <configSections>, under the <sectionGroup="dotnetnuke"></sectionGroup> element, place the following entry:

<section name="googlesitemaps"
    type= "iFinity.DNN.Modules.GoogleSiteMap.GoogleSiteMapSection,
    iFinity.DNN.GoogleSiteMapProvider" />

This entry tells ASP.NET that there is a configuration section called 'googlesitemaps' when the in-built Google Sitemap HttpHandler is called, which brings us to the next entry required, the HttpHandler. In the <httphandlers> section, add the following entry:

<add verb="*" path= "GoogleSiteMap.axd"
   type="iFinity.DNN.Modules.GoogleSiteMap.GoogleSiteMapHandler, 
   iFinity.DNN.GoogleSiteMapProvider" />

This tells ASP.NET that any request coming for GoogleSiteMap.axd should load the GoogleSiteMapHandler, located in the iFinity.DNN.GoogleSiteMapProvider Assembly. Within this handler lies the code that then loads the actual Provider for the GoogleSiteMap.

The next entry in the web.config is the <googlesitemaps> section that ASP.NET was notified of in the first entry above. This contains the actual specification for the Providers that are used in providing Google Sitemap services. This entry should be placed at the end of the web.config file, underneath the <dotnetnuke/> section, and before the </configuration> section closing element.

  <googlesitemaps defaultProvider="BaseGoogleSitemapProvider">
    <providers>
      <add name= "BaseGoogleSiteMapProvider"
           type="iFinity.DNN.Modules.GoogleSiteMap.GoogleSiteMapProvider" 
           defaultPagePriority="0.5" defaultPageUpdateFrequency="daily"
           includeHiddenPages="false"/>
      <add name="View_Blog.GoogleSiteMapProvider"
           type="iFinity.DNN.Modules.GoogleSiteMap.BlogGoogleSiteMapProvider, 
              iFinity.DNN.BlogGoogleSiteMapProvider"
           defaultPagePriority="0.5" defaultPageUpdateFrequency="daily" 
           showArchives="true" includeHiddenPages="false"/>
    </providers>
  </googlesitemaps>

The <googlesitemaps> element provides the place to put any custom providers for specific modules in the DNN framework. The first entry is the 'default' provider and is the base DNN Google Sitemap Provider I have developed. This is in the same assembly as the HTTP handler. The defaultPageUpdateFrequency and defaultPagePriority attributes tell the Provider what to output in the Sitemap XML. There is also an attribute for specifying whether or not hidden pages should be included in the Sitemap or not.

The second entry is the Blog-module specific entry. The naming standard of this relates to the explanation earlier of how the default Provider discovers and loads Module-specific Providers. Because the name is 'View_Blog.GoogleSiteMapProvider' (an inbuilt naming standard), the default Provider knows that this particular Provider should be called for the Sitemap entries whenever a Module on a Page, using the ModuleDefinition FriendlyName of 'View_Blog' is found. Because the BlogGoogleSiteMapProvider uses the GoogleSiteMapProvider as a base class, it also has the 'defaultPagePriority', 'defaultPageUpdateFrequency', and 'includeHiddenPages' attributes. However, the Blog Provider also adds in a new attribute called 'showArchives' which was covered earlier. This list of attributes can be expanded indefinitely for individual Module-specific requirements.

Creating your own DNN Module-specific Google Sitemap Providers

If you've developed your own private assembly module for DotNetNuke and it uses more than the standard page URL to deliver content, the Provider model outlined is a good way to deliver a Google Sitemap for it. All you need to do is create a new assembly, reference the iFinity.DNN.Modules.GoogleSiteMap Provider assembly, and derive your own Provider type from the base GoogleSiteMapProvider type. You can then redefine the SitePages(siteURL) method to index your page in whichever method is most appropriate. The list of SitePage objects returned from your custom provider will then be included in the overall list of SitePage objects the GoogleSitemapProvider first generates, and then transforms into Sitemap-schema compliant XML.

Shortcomings and Potential Expansions

The Provider based model for Google Sitemaps is quite simple - but then Google sitemaps are deceptively simple themselves. There are a few things that could be changed. I haven't used the code for long enough to determine any major shortcomings with the approach, except for perhaps performance. But, given the Googlebot tends to only read the Sitemap on perhaps a once-daily basis, I see it as a respectable tradeoff to get flexibility in the approach.

Expansion, apart from adding more module-specific providers, could include GZip compression of the Sitemap within the base code, as Google allows a GZip compression of the Sitemap. It could also be changed into a Sitemap-set of files, to get around the potential 10,000 URL limit for a large site (think listing site, such as a classified listing site where each item for sale gets its own URL). Google allows the definition of a sitemap index file, which then refers to individual sitemaps. This could be done by the base provider generating the index file, and a series of individual providers returning their own sitemap files. I don't have the requirement for this at the moment, but it could be done easily enough by changing the base code provided here.

Summary

Hopefully, this code will be of use to someone else, as it in itself is based on an Open Source project and the work of others. I'd like to know that people find it a useful way of incorporating Google Sitemaps for their custom DNN Modules without having to re-write the entire sitemap-generating code each time. Maybe it could even be included in the core of a future DNN release and a standard based around the concept for custom module providers to adopt!

Update

I frequently update the code for the DotNetNuke Google Sitemap Provider, but don't always update the code linked to this article. For the very latest version, please go to the iFinity Google Sitemap Provider for DotNetNuke download page.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here