Introduction
This article explains a clean and reusable approach to Sitemap Generation and Sitemap Index Generation for the SEO (Search Engine Optimization) used by Google, Yahoo!, MSN and others.
Background
Search engines often ask for a sitemap index so they can ensure they find all the pages in your site that you want them to find. There is a specific xml format that it has to be in, and there are certain rules that need to be followed. Such as 50,000 items per sitemap - with a maximum of 10MB per sitemap size. http://www.sitemaps.org/
Using the code
The attached code contains two classes of importance: BaseSitemapGenerator
and BaseSitemapIndexGenerator
. The former is used if you know that your sitemap is going to be small (i.e., less than 50,000 and less than 10MB), the latter is for large sites.
BaseSitemapGenerator
To use the former BaseSitemapGenerator
, we simply inherit a class from it and overwrite GenerateUrlNodes()
. In this method, you call WriteUrlLocation
and write each page (without the domain information).
public class SitemapIndexGenerator : BaseSitemapIndexGenerator
{
#region Protected Members
protected override void GenerateUrlNodes()
{
WriteUrlLocation("sitemap.aspx", UpdateFrequency.Weekly, DateTime.Now);
WriteUrlLocation("blog.aspx", UpdateFrequency.Daily, DateTime.Now);
}
#endregion
}
Then, it is a matter of calling the appropriate Generate()
method, to get the string code back. Very easy.
I normally link the sitemap.xml to generate this on the fly (if it is quick). See this link for more information on this.
BaseSitemapIndexGenerator
This is similar to the above; however, there are a few properties that you can set.:
SitemapIndexFileName
- this is the base index filename (will normally be sitemap.xml).SitemapFileNameFormat
- this is the format to use for each sitemap file generated within the index (default is "sitemap{0}.xml").
Normally, this will need to be run by a scheduler as it will take a long time to generate.
What is the XML format?
For the sitemaps, the XML format is:
="1.0"="utf-8"
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<url>
<loc>http://www.domain.com/</loc>
<changefreq>weekly</changefreq>
<lastmod>2009-03-25</lastmod>
</url>
</urlset>
For the sitemap indexes, the XML format is:
="1.0"="utf-8"
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd">
<sitemap>
<loc>http://www.domain.com/sitemap1.xml.gz</loc>
<lastmod>2009-03-24</lastmod>
</sitemap>
<sitemap>
<loc>http://www.domain.com/sitemap2.xml.gz</loc>
<lastmod>2009-03-24</lastmod>
</sitemap>
</sitemapindex>
History
Originally from New Zealand, currently work as Development Directory at a software company in the UK specialising in online marketing and advertising.
I have a blog located at: http://andrew.thomas.net.nz, which is all about development in Microsoft .Net, focused on C#, Asp .NET, SQL Server and SEO. Check it out...