What is it about
In the recent years, RSS has proved to be an extremely useful data-distribution technology. This article addresses the problem of handling different standards of RSS feeds in a single application. It can be useful for everyone who builds either one's own desktop aggregator or a corporate intranet environment. This article is accompanied with a skeleton of a newsreader application.
This article assumes you're using MSXML 3.0+ as the XML/XSLT processor.
Standardization
The only sad thing about RSS is the number of standards in use. You cannot be sure what you'll get while surfing the net, so you must be ready for anything. "Anything" is:
- RSS 0.90 - the initial release of RSS tech, created by Netscape, is almost extinct now. Specs are still available at PurplePages archive.
- RSS 2.0/0.91-0.94 - the most popular branch of RSS. Revised and simplified (by UserLand's Dave Winer) version of the original. For this format, RSS stands for Really Simple Syndication. It became even more popular with the introduction of podcasting. By the way, don't be fooled with the version numbers: the version prior to 2.0 was 0.94, not 1.0 (which is entirely different from 0.9x)! Specs are available in the UserLand site.
- RSS 1.0 - not really a standard, but a derivative of RDF (Resource Description Framework) - Web standard for metadata developed by W3C. Verbose and extensible (with the use of modules), it is much more flexible than v2.0. For the 0.90 and 1.0 versions, RSS stands for RDF Site Summary. Specs are available in the site, RSS-DEV Working Group.
- Atom - the most recent, thus the most rare syndication format. Atom is the first attempt (undertaken by the Internet Engineering Task Force Working Group) to develop a standardized, enterprise-wide syndication format. The complete specification can be found on the IETF site.
Transform...
Let's begin with the stylesheets. Three things are worth taking a note:
local-name()
XSLT function: very useful when you need to rip out all the namespace stuff to painlessly obtain the name of the node.
disable-output-escaping
option of the xsl:value-of
instruction: a must-know XSLT element. Cause: www-masters tend to embed funky HTML markup into the "description
" and "summary
" fields. By setting this option to "yes" we preserve the markup, having a nice-looking page as a result, and not a mess of tags. On the downside we have one security problem: disable-output-escaping
can expose your local computer to malicious scripts, if it is embedded into the feed. Normally, you must have some kind of stripper for <SCRIPT>
and <OBJECT>
tags; unless you have it, you are advised to read RSS feeds only from trusted sites.
<xsl:text/>
instruction: use when you want to strip unnecessary whitespace "by hand". This is very useful for keeping the output HTML code indentation under control.
The first three stylesheets can be used to build a newspaper-style news feed:
Listing 1.1: XSLT stylesheet for building newspaper-style HTML page from an RSS 2.0/0.91 feed
="1.0"
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="html" version="1.0"
indent="yes" encoding="iso-8859-1"/>
<xsl:template match="/">
<html><body>
<div style=
"padding: 1em;background-color: #fafafa; border: 1px solid #cfcfcf;">
<xsl:for-each select="rss/channel/item">
<xsl:variable name="stl">
<xsl:text/>background-color: #efeff5;
border: 1px solid #cfcfcf;padding: 0em 1em 0em; margin:
<xsl:text/>
<xsl:choose>
<xsl:when test="position()=last()"> 0em</xsl:when>
<xsl:otherwise> 0em 0em 1em 0em</xsl:otherwise>
</xsl:choose>
</xsl:variable>
<div>
<xsl:attribute name="style"><xsl:value-of select="$stl"/>
</xsl:attribute>
<p><h3 style="color:#800000"><xsl:value-of select="title"/></h3>
</p>
<p><xsl:value-of disable-output-escaping="yes"
select="description"/>
</p>
<xsl:variable name="pub" select="pubDate"/>
<xsl:if test="count($pub) > 0">
<p align="right"
style="margin:0; padding:0"><xsl:value-of select="pubDate"/>
</p>
</xsl:if>
<p style="margin:0; padding:0em 0em 1em 0em"><a target="_blank">
<xsl:attribute name="href">
<xsl:value-of select="link"/>
</xsl:attribute>
<xsl:value-of select="link"/>
</a></p>
</div>
</xsl:for-each>
</div>
</body></html>
</xsl:template>
</xsl:stylesheet>
Listing 1.2: XSLT stylesheet for building newspaper-style HTML page from an RSS 1.0 feed
="1.0"
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="html"
version="1.0" indent="yes" encoding="iso-8859-1"/>
<xsl:template match="/">
<html><body>
<div style=
"padding: 1em;background-color: #fafafa; border: 1px solid #cfcfcf;">
<xsl:for-each select="*/*[local-name()='item']">
<xsl:variable name="stl">
<xsl:text/>background-color: #efeff5;
border: 1px solid #cfcfcf;padding: 0em 1em 0em; margin:
<xsl:text/>
<xsl:choose>
<xsl:when test="position()=last()"> 0em</xsl:when>
<xsl:otherwise> 0em 0em 1em 0em</xsl:otherwise>
</xsl:choose>
</xsl:variable>
<div>
<xsl:attribute name="style"><xsl:value-of select="$stl"/>
</xsl:attribute>
<p><h3 style="color:#800000">
<xsl:value-of select="./*[local-name()='title']"/>
</h3></p>
<p><xsl:value-of disable-output-escaping="yes"
select="./*[local-name()='description']"/>
<br/><br/>
<xsl:variable name="pub" select="*[local-name()='date']"/>
<xsl:variable name="pub_date"
select="concat(substring($pub, 0, 11), ', ',
substring($pub, 12, 8), ' (GMT+',
substring($pub, 21, 5), ')')"/>
<xsl:if test="count($pub) > 0">
<div align="right" style="margin:0em; padding:0em 0em 0em 0em;">
<xsl:value-of select="$pub_date"/></div>
</xsl:if>
<a target="_blank">
<xsl:attribute name="href">
<xsl:value-of select="./*[local-name()='link']"/>
</xsl:attribute><xsl:value-of select="./*[local-name()='link']"/>
</a></p>
</div>
</xsl:for-each>
</div>
</body></html>
</xsl:template>
</xsl:stylesheet>
Listing 1.3: XSLT stylesheet for building newspaper-style HTML page from an atom feed
="1.0"
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="html"
version="1.0" indent="yes" encoding="iso-8859-1"/>
<xsl:template match="/">
<html><body>
<div style=
"padding: 1em;background-color: #fafafa; border: 1px solid #cfcfcf;">
<xsl:for-each select="*/*[local-name()='entry']">
<xsl:variable name="stl">
<xsl:text/>
background-color: #efeff5;
border: 1px solid #cfcfcf;padding: 0em 1em 0em; margin:
<xsl:text/>
<xsl:choose>
<xsl:when test="position()=last()"> 0em</xsl:when>
<xsl:otherwise> 0em 0em 1em 0em</xsl:otherwise>
</xsl:choose>
</xsl:variable>
<div>
<xsl:attribute name="style"><xsl:value-of select="$stl"/>
</xsl:attribute>
<p><h3 style="color:maroon">
<xsl:value-of select="*[local-name()='title']"/></h3></p>
<p><xsl:value-of disable-output-escaping="yes"
select="*[local-name()='summary']"/></p>
<xsl:variable name="pub" select="*[local-name()='updated']"/>
<xsl:variable name="pub_date" select=
"concat(substring($pub, 0, 11), ', ', substring($pub, 12, 8))"/>
<xsl:if test="count($pub)>0">
<p align="right" style="margin:0; padding:0;">
<xsl:value-of select="$pub_date"/>
</p>
</xsl:if>
<p style="margin:0; padding:0em 0em 1em 0em;"><a target="_blank">
<xsl:attribute name="href">
<xsl:value-of
select="*[local-name()='link']/@*[local-name()='href']"/>
</xsl:attribute>
<xsl:value-of
select="*[local-name()='link']/@*[local-name()='href']"/>
</a></p>
</div>
</xsl:for-each>
</div>
</body></html>
</xsl:template>
</xsl:stylesheet>
The next point of interest is the list of all the titles found in the feed - the outline. Each item in this list will be a link to a JavaScript "navTo" function, with a numeric argument equal to the item's position in the list.
Listing 2.1: XSLT stylesheet for retrieving a list of items from an RSS 2.0/0.91 feed
="1.0"
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="html"
version="1.0" indent="yes" encoding="iso-8859-1"/>
<xsl:template match="/">
<html><body><ul style="margin-left:25">
<xsl:for-each select="rss/channel/item">
<li><a href="javascript:navTo('{position()}')">
<font style="size:-1;color:#800000">
<xsl:value-of select="title"/>
</font>
</a><br/></li>
</xsl:for-each>
</ul></body></html>
</xsl:template>
</xsl:stylesheet>
Listing 2.2: XSLT stylesheet for retrieving a list of items from an RSS 1.0 feed
="1.0"
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="html" version="1.0"
indent="yes" encoding="iso-8859-1"/>
<xsl:template match="/">
<html><body><ul style="margin-left:25">
<xsl:for-each select="*/*[local-name()='item']">
<li><a href="javascript:navTo('{position()}')">
<font style="size:-1;color:#800000">
<xsl:value-of select="./*[local-name()='title']/text()"/>
</font>
</a><br/></li>
</xsl:for-each>
</ul></body></html>
</xsl:template>
</xsl:stylesheet>
Listing 2.3: XSLT stylesheet for retrieving a list of items from an atom feed
="1.0"
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="html"
version="1.0" indent="yes" encoding="iso-8859-1"/>
<xsl:template match="/">
<html><body><ul style="margin-left:25">
<xsl:for-each select="*/*[local-name()='entry']">
<li><a href="javascript:navTo('{position()}')">
<font style="size:-1;color:#800000">
<xsl:value-of select="./*[local-name()='title']/text()"/>
</font>
</a><br/></li>
</xsl:for-each>
</ul></body></html>
</xsl:template>
</xsl:stylesheet>
The last set of stylesheets do the job of transforming a single news item. Please take a note that these transformations cannot be applied to the original RSS file; prior to using them, you must programmatically extract the required item and apply one of the stylesheets to it.
Listing 3.1: XSLT stylesheet for representing a distinct news item from an RSS 2.0/0.91 feed
="1.0"
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="html"
version="1.0" indent="yes" encoding="iso-8859-1"/>
<xsl:template match="*">
<div style="padding: 0em 1em 0em;
background-color: #fafafa; border: 1px solid #cfcfcf;">
<p><h3 style="color:#800000"><xsl:value-of select="title"/></h3></p>
<div style="padding: 0em 1em 0em; margin: 0em;
background-color: #efeff5; border: 1px solid #cfcfcf;">
<p><xsl:value-of disable-output-escaping="yes" select="description"/>
</p>
<xsl:variable name="pub" select="pubDate"/>
<xsl:if test="count($pub)>0">
<p align="right" style="margin:0em; padding:0em 0em 1em 0em;">
<xsl:value-of select="pubDate"/></p>
</xsl:if>
</div>
<p><a target="_blank">
<xsl:attribute name="href">
<xsl:value-of select="link"/>
</xsl:attribute>
<xsl:value-of select="link"/>
</a></p>
</div>
</xsl:template>
</xsl:stylesheet>
Listing 3.2: XSLT stylesheet for representing a distinct news item from an RSS 1.0 feed
="1.0"
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="html"
version="1.0" indent="yes" encoding="iso-8859-1"/>
<xsl:template match="*">
<div style="padding: 0em 1em 0em;
background-color: #fafafa; border: 1px solid #cfcfcf;">
<p><h3 style="color:#800000">
<xsl:value-of select="./*[local-name()='title']"/>
</h3></p>
<div style="padding: 0em 1em 0em; margin: 0em;
background-color: #efeff5; border: 1px solid #cfcfcf;">
<p><xsl:value-of disable-output-escaping="yes"
select="./*[local-name()='description']"/></p>
<xsl:variable name="pub" select="*[local-name()='date']"/>
<xsl:variable name="pub_date"
select="concat(substring($pub, 0, 11), ', ',
substring($pub, 12, 8),
' (GMT+', substring($pub, 21, 5), ')')"/>
<xsl:if test="count($pub) > 0">
<p align="right" style="margin:0em; padding:0em 0em 1em 0em;">
<xsl:value-of select="$pub_date"/>
</p>
</xsl:if>
</div>
<p><a target="_blank">
<xsl:attribute name="href">
<xsl:value-of select="./*[local-name()='link']"/>
</xsl:attribute><xsl:value-of select="./*[local-name()='link']"/>
</a></p>
</div>
</xsl:template>
</xsl:stylesheet>
Listing 3.3: XSLT stylesheet for representing a distinct news item from an atom feed
="1.0"
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="html"
version="1.0" indent="yes" encoding="iso-8859-1"/>
<xsl:template match="*">
<div style="padding: 0em 1em 0em;
background-color: #fafafa; border: 1px solid #cfcfcf;">
<p><h3 style="color:#800000">
<xsl:value-of select="*[local-name()='title']"/>
</h3></p>
<xsl:variable name="cnt" select="*[local-name()='content']">
</xsl:variable>
<xsl:if test="count($cnt)>0">
<div style="padding: 0em 1em 0em 1em; margin: 0em;
background-color: #efeff5; border: 1px solid #cfcfcf;">
<p><xsl:value-of disable-output-escaping="yes"
select="*[local-name()='content']"/></p>
<xsl:variable name="pub" select="*[local-name()='updated']"/>
<xsl:variable name="pub_date"
select="concat(substring($pub, 0, 11), ', ',
substring($pub, 12, 8))"/>
<xsl:if test="count($pub)>0">
<p align="right" style="margin:0em; padding:0em 0em 1em 0em;">
<xsl:value-of select="$pub_date"/>
</p>
</xsl:if>
</div>
</xsl:if>
<xsl:if test="count($cnt)=0">
<p style="padding: 1em; margin: 0em;
background-color: #efeff5; border: 1px solid #cfcfcf;">
<xsl:value-of disable-output-escaping="yes"
select="*[local-name()='summary']"/>
</p>
</xsl:if>
<p><a target="_blank">
<xsl:attribute name="href">
<xsl:value-of
select="*[local-name()='link']/@*[local-name()='href']"/>
</xsl:attribute>
<xsl:value-of
select="*[local-name()='link']/@*[local-name()='href']"/>
</a></p>
</div>
</xsl:template>
</xsl:stylesheet>
...and read
Before doing anything to an RSS file, you need to know the standard it belongs to, right? We do this by analyzing the child node of <xml-stylesheet>
:
Listing 4.1: Extracting the RSS standard
function whatStd(rssdocument)
{
var rssroot =
rssdocument.documentElement.selectSingleNode("/*");
var rsssdtd = rssroot.baseName;
switch(rsssdtd)
{
case "rss":
return "rss2";
case "RDF":
return "rss1";
case "feed":
return "atom";
default:
return "";
}
}
The bad thing about this (and all the following) code is that it heavily uses Microsoft extensions to W3C's XML API. As a solution you can simply extract the firstChild
of the DocumentElement
.
Listing 4.2: Extracting RSS channel info
var rss_title;
switch(standard)
{
case "atom":
rss_title = xml.documentElement.selectSingleNode(
"/*/*[local-name()='title']");
break;
case "rss1":
rss_title = xml.documentElement.selectSingleNode(
"/*/*[local-name()='channel']/*[local-name()='title']");
break;
case "rss2":
rss_title = xml.documentElement.selectSingleNode(
"/*/channel/title");
break;
}
var rss_link;
switch(standard)
{
case "atom":
rss_link = xml.documentElement.selectSingleNode(
"/*/*[local-name()='link']/@*[local-name()='href']");
break;
case "rss1":
rss_link = xml.documentElement.selectSingleNode(
"/*/*[local-name()='channel']/*[local-name()='link']");
break;
case "rss2":
rss_link = xml.documentElement.selectSingleNode(
"/*/channel/link");
break;
}
rsstitle.innerHTML =
"<a target=\"_blank\" title=\"Opens in new window\" href=\"" +
rss_link.text +
"\"><font color=\"maroon\" size=\"4\"><b>" +
rss_title.text + "</b></font></a>";
Having extracted a channel info, it'll be very easy to extract a single item from the feed. Here we go.
function navTo(where)
{
if(rssFile != "")
{
var rss_item;
switch(standard)
{
case "atom":
rss_item = xml.documentElement.selectSingleNode(
"/*/*[local-name()='entry'][" + where + "]");
break;
case "rss1":
rss_item = xml.documentElement.selectSingleNode(
"/*/*[local-name()='item'][" + where + "]");
break;
case "rss2":
rss_item = xml.documentElement.selectSingleNode(
"/*/channel/item[" + where + "]");
break;
default:
rss_item = null;
}
if(rss_item)
{
var item_i =
rss_item.transformNode(xsl_i.documentElement);
contentcell.vAlign = "Top";
content.innerHTML = item_i;
...
}
}
}
Take a note of the xsl_i
(used in the transformation), which is the item-extracting stylesheet I've described earlier. Where
is a string representation of a number - the position of an item inside a feed.
That's all. Feel free to e-mail me all your suggestions/opinions/bug reports.
Links
Tutorials
RSS lists
Tools and everything else
History
- 23rd November, 2005
- Article posted, first version of stylesheets and newsreader.
- 13th February, 2006
- Code cleanup, some functions completely rewritten;
- XPath queries cleaned up;
- 'save HTML' capability added;
- XSLT stylesheets optimized/cleaned up.
- April 25th, 2006:
- Automated/manual feed update capability added;
- minor improvements and bugfixes.