Summary
This article will help people familiar with HTML to start producing XHTML 1.0 compliant documents. This articles approach is very simple, as is XHTML. However it is this very simplicity, of XHTML, which baffles some web developers and designers.
This article is not an indepth look into XML or why XHTML has replaced HTML.
Requirements
A basic understanding of HTML and CSS is recommended for this article. No JavaScript, ASP or XML knowledge is required. Though XML knowledge will help in understanding the XHTML approach and reasons for it.
Why change to XHTML?
In this article I won't be going into the whys of using XHTML or the benefits involved. That will be a topic for a later article. However if you want some good reasons to use XHTML then check these links out:
Won't XHTML break my sites in visitors browsers?
No, put simply. XHTML is very backwards compatible and a page coded using XHTML 1.0 Transitional will work in all browsers that support HTML 4.01. The W3C have done a very good job of moving web documents closer to XML but without breaking compatibility or sending more web developers over the proverbial cliff.
HTML vs. XHTML Examples
So you want to get started in either creating new XHTML compliant documents or converting your current HTML 4.01 documents into XHTML 1.0 documents.
Lets start with some actual HTML vs. XHTML, and then move onto the differences in point form.
An HTML document
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>HTML to XHTML Example: HTML page</title>
<link rel="Stylesheet" href="htmltohxhtml.css" type="text/css" media="screen">
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
</head>
<body>
<p>This is the HTML page. It works and is encoded just like any HTML page you
have previously done. View <a href="htmltoxhtml2.htm">the XHTML version</a> of
this page to view the difference between HTML and XHTML.</p>
<p>You will be glad to know that no changes need to be made to any of your CSS files.</p>
<hr>
<h1>Standards</h1>
<p>Standards are important for, and this is only one reason, the simple fact that with a
standardised web you will only have to code your site once and it will work on all
browsers, on all platforms and on all devices.</p>
<p>Following are some useful web standards links.</p>
<h2>Useful Links</h2>
<table cellpadding="0" cellspacing="0">
<tr class="tblheader">
<td>Name</td>
<td>Link</td>
</tr>
<tr>
<td class="tbldata">Web Standards Project, WASP</td>
<td class="tbldata"><a href="http://www.webstandards.org">webstandards.org</a></td>
</tr>
<tr>
<td class="tbldata">The W3C</td>
<td class="tbldata"><a href="http://www.w3c.org">w3c.org</a></td>
</tr>
<tr>
<td class="tbldata">XHTML, HTML Validator</td>
<td class="tbldata"><a
href="http://www.nypl.org/styleguide/">nypl.org/styleguide/</a></td>
</tr>
<tr>
<td class="tbldata">New York Public Library Style Guide</td>
<td class="tbldata"><a
href="http://validator.w3.org/">validator.w3.org/</a></td>
</tr>
<tr>
<td class="tbldata">Standards Evangelist, Paul Watson</td>
<td class="tbldata"><a
href="mailto:paulmwatson@email.com">paulmwatson@email.com</a></td>
</tr>
</table>
<hr>
<p>
<a href="http://validator.w3.org/check/referer"><img border="0"
src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"
height="31" width="88"></a>
</p>
</body>
</html>
This is a well formed and valid HTML 4.01 Transitional document. You can validate it against the
W3C HTML Validator Service.
An XHTML document
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>HTML to XHTML Example: XHTML page</title>
<link rel="Stylesheet" href="htmltohxhtml.css" type="text/css" media="screen" />
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />
</head>
<body>
<p>This is the XHTML page. As you can see the result between the two pages
is identical, even though one is in HTML 4.01 and the other is in XHTML 1.0. View
<a href="htmltoxhtml.htm">the HTML version</a> of this page to view the difference
between HTML and XHTML.</p>
<hr />
<h1>Standards</h1>
<p>Standards are important for, and this is only one reason, the simple fact that
with a standardised web you will only have to code your site once and it will work
on all browsers, on all platforms and on all devices.</p>
<h2>Useful Links</h2>
<p>Following are some useful web standards links.</p>
<table cellpadding="0" cellspacing="0">
<tr class="tblheader">
<td>Name</td>
<td>Link</td>
</tr>
<tr>
<td class="tbldata">Web Standards Project, WASP</td>
<td class="tbldata"><a
href="http://www.webstandards.org">webstandards.org</a></td>
</tr>
<tr>
<td class="tbldata">The W3C</td>
<td class="tbldata"><a href="http://www.w3c.org">w3c.org</a></td>
</tr>
<tr>
<td class="tbldata">XHTML, HTML Validator</td>
<td class="tbldata"><a
href="http://www.nypl.org/styleguide/">nypl.org/styleguide/</a></td>
</tr>
<tr>
<td class="tbldata">New York Public Library Style Guide</td>
<td class="tbldata"><a href="http://validator.w3.org/">validator.w3.org/</a></td>
</tr>
<tr>
<td class="tbldata">Standards Evangelist, Paul Watson</td>
<td class="tbldata"><a
href="mailto:paulmwatson@email.com">paulmwatson@email.com</a></td>
</tr>
</table>
<hr />
<p>
<a href="http://validator.w3.org/check/referer"><img border="0"
src="http://www.w3.org/Icons/valid-xhtml10" alt="Valid XHTML 1.0!"
height="31" width="88" /></a>
</p>
</body>
</html>
This is a well formed and valid
XHTML 1.0 Transitional document. You can validate it against the
W3C HTML Validator Service.
The Differences
Frankly the difference between HTML 4.01 and XHTML 1.0 is almost laughable. Don't think your are missing something important just because it is so easy, you aren't, because it really is very easy.
I will list the differences and then explain each one in detail:
- DOCTYPE reference has changed
- xmlns reference in the HTML tag
- All tags in lowercase
- Valid structure
- Attribute quotes are mandatory
- "Empty" tags must be closed now
That is it,
nothing very earth shattering at all. Lets get into the details.
DOCTYPE
Naturally from HTML 3 to HTML 4.01 your DOCTYPE changed. Similarly from HTML 4.01 to XHTML 1.0 your DOCTYPE must change.
What is a DOCTYPE? It is a declaration at the top of your document. A DOCTYPE, simply put, is a declaration of what standard or specification the web browser should use to interpret the web document. You are telling the web browser that what follows conforms with a certain specification, e.g. XHTML or HTML 4.01. The web browser can then take advantage of this knowledge. It is becoming very important for you to use a DOCTYPE declaration and in fact it is mandatory for XHTML 1.0. If you don't put it in then XHTML 1.0 compliant browsers will not render your page at all.
If you are writing ASP pages then put the DOCTYPE just under the <%@ Language="VBScript" %>
declaration. Essentially the clients web browser must see the DOCTYPE on the first line of the web document.
An HTML 4.01 DOCTYPE looks like this:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
So for your XHTML documents simply put <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "DTD/xhtml1-transitional.dtd">
at the top of your page.
xmlns
The xmlns, or XML NameSpace, declaration simply tells the browser, once again, to use the XHTML specification located at W3C. This declaration is carried over from the XML specification and has no correlation in HTML 4.01. People familiar with VML will recognise this usage.
You should locate xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"
in the HTML tag, like so:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
All tags in lowercase
Since XHTML is a valid XML specification it is case sensitive. This means that <STRONG>
is not the same thing as <strong>
.
What this all means to you is that from henceforth you should put all tags and attributes in lowercase, not a mix or just uppercase.
*On this topic: As with the English language there are exceptions to every rule. In this case ensure that your DOCTYPE declaration has DOCTYPE in uppercase. If you don't, then it is not valid and the browser or validator won't pick the declaration up. I found this out the hard way :)
Valid structure
A lot of web developers create invalidly structured HTML, I know I used to. For instance this snippet:
<p><b>This is invalid</p></b>
is
not valid because the paragraph tag is closed inside the strong tag, while the
strong tag is opened inside the paragraph tag. However HTML 4.01 lets you off
without even a warning.
XHTML 1.0 however will crack down on this and your web document will not be valid. To be valid you should maintain a valid structure, like so:
<p><b>This is invalid</b></p>
Mandatory attribute quotes
Attribute quotes are the quotes around the value of an attribute. For instance the src attribute of an image must have its value surrounded by quotes, like so: src="images/bob.gif"
Culprits like Microsoft Visual Interdev do not put quotes around attribute values and web browsers allow this (though Netscape can sometimes get confused, as it is wont to do.) XHTML compliant browsers will not render your document if you do not use quotes. Single quotes btw do not count.
So for XHTML 1.0 never do <p style=font-weight: bold>Where are your quotes?</p>
but rather do <p style="font-weight: bold">Ahhh, there they are!</p>
Close "empty" tags
An empty tag is a tag such as <img>
or <br>
. Essentially it is a tag without a closing tag.
Because XHTML is a specification of XML all tags must be closed. Either by <p>closed</p>
or by <p />
.
So for XHTML all you need to do is make sure you put a /
before the closing bracket of any empty tags.
It must be noted that you should also put a space inbetween the /
and the rest of the tag's attributes, like so <img src="images/bob.gif" width="50" height="50" alt="Bob, cavorting" />
. The reason for this is that Netscape will definitley fall over if you put the /
in without a space.
Wrapping Up
Yes, I am dead serious. That is all there is to it.
Remeber to use a DOCTYPE, put in your xmlns, use lowercase for attributes and tags, always use valid structure, put attribute values in quotes and always close empty tags. Once you do that, you are well ahead of the curve and preparing your web documents for the promises of XML.
Please note that this article is based on the Transitional XHTML spec and no the Strict spec. The reason for this choice is that the Strict spec is nowhere near as backwards compatible as the Transitional spec.
So XHTML is really simple and really only involves a bit more dedication and concentration from web developers. If you want another article on the why of XHTML please write to me and I will do it.
I learnt XHTML through zeldman.com and the incredibly to-the-point New York Public Library Style Guide.