Introduction
Search engine crawlers do not follow JavaScript links. If you are using AJAX to load the text dynamically, it will not be indexed by Search engines. URL rewriting is the method which can be used to make crawlers index all links on the site.
Objective
- If normal user hits the site, he should see AJAX enabled site (JavaScript) links.
- If the crawler hits the site, the links should be anchor links which can be index.
- If the anchor link which is indexed by the crawler is browsed, then the dynamic content should be visible while keeping the URL same.
Example: If the user browses the site http://example.com and clicks on the link 'First Article', then the content of the first article is loaded dynamically. But if search crawler browses the site, then the link will be http://example.com/Articles/FirstArticle which is crawlable. Now if normal user requests this URL which is indexed by search engine, then First Article should be displayed keeping the URL http://example.com.
Using the Code
URL Rewriting
- Write anchor tag as
<a href=”http://www.example.com/Aditya/Bhave/”>Aditya Bhave</a>
in HTML. - When user will click on this link, a request will be sent by browser with the above URL.
- This URL will be processed by the custom
HttpModule
. HttpModule
will convert the above URL into the URL which is required by the application. i.e. http://www.example.com/alllinks.aspx?firstname=Aditya&lastname=Bhave - The request is processed as if the request was http://www.example.com/alllinks.aspx?firstname=Aditya&lastname=Bhave
- Response is sent to the browser while the URL still remains http://www.example.com/alllinks/Aditya/Bhave/
You can compare this thing with Server.Transfer(“/…./…”)
in ASP.NET.
Simple Example of HttpModule which rewrites URL
public class SimpleRewriter : IHttpModule
{
HttpApplication _application = null;
#region IHttpModule Members
public void Dispose()
{
}
public void Init(HttpApplication context)
{
context.BeginRequest += new System.EventHandler(context_BeginRequest);
_application = context;
}
void context_BeginRequest(object sender, EventArgs e)
{
try
{
string requestUrl = _application.Context.Request.Path.Substring
(0, _application.Context.Request.Path.LastIndexOf("/"));
string[] parameters = requestUrl.Split(new char[] { '/' });
if (parameters.Length > 2)
{
int paramLength = parameters.Length;
string firstname = parameters[paramLength - 2];
string lastname = parameters[paramLength - 1];
_application.Context.RewritePath(string.Format
("~/alllinks.aspx?firstname={0}&lastname={1}", firstname, lastname));
}
}
catch (Exception ee)
{
//Redirect to error page
//Or throw custom exception
}
}
#endregion
}
URL Rewriting as a Search Engine Optimization Technique in AJAX Site
- AJAX sites have all JavaScript links.
- If we use
LinkButton
, then the link will be:
<a href="javascript:__doPostback(‘….’,’…..’);">Aditya Bhave</a>
as rendered in the browser.
- If a crawler comes across a JavaScript link, it is skipped (not indexed).
- If we want our link : Aditya Bhave to be indexed by the crawler and searchable in search engine, then the link should be in the anchor tag, e.g.
<a href="http://www.example.com/Aditya/Bhave/">Aditya Bhave</a>
- Most Crawlers do not support JavaScript. When the crawler hits the site, the user agent is sent with request that tells us that this is a crawler and not normal user. We can check if the browser supports JavaScript. If it does not support JavaScript, then we should hide the
LinkButton
s and add anchor tags at runtime. So there should be two links:
<asp:LinkButton ID="lnkLink" runat="server" Text='Some Text'
OnClick="lnkLink_Click"></asp:LinkButton>
and
<a href="" runat="server" id="htmlLnkLink"></a>
- If browser does not support JavaScript, then hide
LinkButton
and change the href
attribute of anchor link and make it visible.
Example:
if (Request.Browser.EcmaScriptVersion.Major <= 0)
{
lnkArticle.Visible = false;
aLnkArticle.attributes["href"]=
"http://www.example.com/alllinks.aspx/Firstname/Lastname;
aLnkArticle.InnerText = "FirstName LastName"; //To show the link text
aLnkArticle.Visible = true;
}
When the crawler will hit the anchor link, i.e. http://www.example.com/alllinks.aspx/Aditya/Bhave/, change the URL to http://www.example.com/alllinks.aspx?firstname=Aditya&lastname=Bhave using URL Rewriting.
Check in the Page_Load
event of the alllinks.aspx page if the Querystring
variable with name firstname
or lastname
exists.
If it exists, then execute the code in LinkButton_Click
event to show the content to crawler.
string firstname = Request.QueryString["firstname"];
string lastname = Request.QueryString["lastname"];
SetName(firstname, lastname);
Now the crawler will see the content and index your URL, i.e., http://www.example.com/Aditya/Bhave/ and it will appear in search results.
Now, if the normal user clicks the above link which was found in search results, he must see the dynamic content while keeping the URL same as http://www.example.com/alllinks.aspx.
We can implement the following approach to achieve this:
- The URL http://www.example.com/Aditya/Bhave/ will be rewritten by
HttpModule
as
http://www.example.com?firstname=Aditya&lastname=Bhave - Check if querystring exists and if browser supports JavaScript. If both are
true
, then put firstname
and lastna
me in Session
variable and redirect to the same page. - Now the querystring is gone but
Session
variable exists. Check if Session
variable exists, then execute the same code as in LinkButton_Click
to show dynamic content and remove the Session
variables.
#region Process RequestParameters
if (Request.QueryString["firstname"] != null)
{
if (Request.Browser.EcmaScriptVersion.Major > 0)
{
Session["firstname"] = null;
Session["lastname"] = null;
Session["firstname"] = Request.QueryString["firstname"];
Session["lastname"] = Request.QueryString["lastname"];
Response.Redirect("alllinks.aspx", true);
}
else
{
string firstname = Request.QueryString["firstname"];
string lastname = Request.QueryString["lastname"];
SetName(firstname, lastname);
}
}
#endregion
#region Process Session Variables
if (Session["firstname"] != null)
{
string firstname = Convert.ToString(Session["firstname"]);
string lastname = Convert.ToString(Session["lastname"]);
SetName(firstname, lastname);
Session.Remove("firstname");
Session.Remove("lastname");
}
#endregion
Please see the attached solution.
How to Check If It Is Working Fine
Use Mozilla firefox's addon 'PrefBar
'. Select the user agent as Lynx. (This will browse the page as if the browser is Lynx. Lynx is a text browser which does not support JavaScript.)
Disable colour, images, JavaScript, Flash and browse the site. Browse the same site in Internet Explorer. Compare the two sites. Links are normal anchor links in Mozilla (Lynx) and if you click on it performs full postback. But if you click on the Link in Internet Explorer, then AJAX is used and content is loaded dynamically. Copy the link from Firefox, e.g., http://localhost:1234/alllinks.aspx/Aditya/Bhave/ and browse it in Internet Explorer. In Internet Explorer, it should show the URL as http://localhost:1234/alllinks.aspx while the dynamic content is loaded.
Your suggestions are most welcome. Thanks for reading.
History
- 11th August, 2010: Initial post