Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / web / XHTML

DOCTYPE (Document Type) Explored

4.69/5 (3 votes)
27 Sep 2014CPOL9 min read 17.4K  
In this blog, we will explore how does a browser comes to know which version of HTML has been used in a Web Page. How Quirks mode & Standard mode affect the page rendering.

Recently, when I started studying HTML5, the first question that came to my mind was how does a browser come to know if written HTML is compatible with HTML v4.1 or HTML v5.

In order to find out the reply to the same question, I started exploring and here I would like to share my knowledge about the same. While exploring, I came to know that all this is controlled by a tag called <!DOCTYPE> which is the very first tag in most of the web pages which really surprised me because every web page has this tag added automatically whenever a page is added using some IDE but I never cared about this tag and never tried to explore it, but this time just out of curiosity to know more about this tag, I dug up a little more.

In this article, I will be explaining about <!DOCTYPE> tag and will give answer to the following questions.

  • How does a browser come to know if the page written is in HTML4.1 or HTML5?
  • What is a <!DOCTYPE> tag and what does this tag do?
  • How many types of DOCTYPE do HTML4.1 & HTML5 have?
  • How <!DOCTYPE> affects the rendering of HTML elements on different browsers?
  • How does a wrong "<!DOCTYPE>" make an HTML page invalid?
  • How to verify if a page written is valid or not?
  • How should we decide on defining the type of <!DOCTYPE>?
  • How <!DOCTYPE> is related to document mode (Standard, Quirks and Almost-standards mode) and how does a browser decide about rendering of a web page in Standard mode or Quirks mode?

Let's Start Exploring

So let’s start answering one by one.

How does a browser come to know, if the page written is in HTML4.1 or HTML5?

As I mentioned earlier, whenever a webpage is added using some IDE, a tag called <!DOCTYPE> is automatically added on the top of the page which has few attributes defined in it. This <!DOCTYPE> tag represents and signals to the browser page about the used HTML version. Whenever a browser encounters a webpage that contains a <!DOCTYPE>, it uses the value of the document type to determine the document mode for the webpage. Because HTML 5 has just one <!DOCTYPE> which we will discuss in a little while, this <!DOCTYPE> is defined like <!DOCTYPE html>. This tag itself denotes that written webpage is compatible with HTML 5. So whenever <!DOCTYPE> is defined as <!DOCTYPE html>, it means the HTML5 is being used.

What is a "DOCTYPE" tag and what does this tag do?

A "Document Type Declaration" or <!DOCTYPE> tag instructs the web browser about the version of HTML, web page is written in and about how other tags will be rendered on the browser.

The <!DOCTYPE> tells a browser, "I’m using HTML 4.01." When the browser sees that, it assumes you know what you’re talking about and that you really are writing HTML 4.01. That’s good because the browser will use the layout and display rules for HTML 4.01. This tag informs the browser that the written HTML is a Standard and accepted by all the browsers. This Standard could be any one out of 3 standards, i.e. Strict, Transitional & Frameset about which we will discuss further down the line.

When "DOCTYPE" is declared in a page, then the browser knows exactly how to handle your page and (at least on any browser you’d care about) the page is going to display as you’d expect. It tells the browser the type of the document.

"DOCTYPE" declaration represents that Standard HTML has been written and the written HTML page is compliant with the standards defined by W3C (Worldwide Web Consortium).

In HTML 4.01, the <!DOCTYPE> declaration refers to a DTD (Document Type Definition). The DTD specifies the rules for the markup language, so that the browsers render the content correctly.

The purpose of a DTD is to define the legal building blocks of an XML document. A DTD defines the document structure with a list of legal elements and attributes. A DTD can be declared inline inside an XML document, or as an external reference.

A <!DOCTYPE> tag must be the first tag in an HTML document and it looks like:

HTML
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN""http://www.w3.org/TR/html4/loose.dtd">

The following picture dissects each part of the DOCTYPE tag. Read carefully to get a fair understanding about the same.

Document Type Declaration

Document Type Declaration Transitional

Now you might be thinking that we have not talked about the word "Transitional". What’s with this transitional? If we’re writing "standard" HTML 4.01, why is it transitional? Let’s understand the meaning of the same.

There are actually two DOCTYPEs, one for those transitioning to HTML 4.01, and a stricter DOCTYPE for those who are already there.

Imagine you’ve got a Web site with hundreds of Web pages, all written in nonstandard HTML. You’d like to improve the site and get that entire HTML up to the 4.01 standard, but you’re using lots of old legacy stuff from back in the 2.0 and 3.2 days of HTML. What do you do? Use the HTML 4.01 Transitional DOCTYPE, which allows you to validate your pages but still permits some of the legacy HTML. That way, you can be sure you don’t have any outright mistakes in your markup (like typos, mismatched tags, and so on) but you won’t have to rework your entire HTML to get it to validate. Then, after you’ve removed the entire legacy HTML, you’re all ready for the strict document type, which ensures you have a fully compliant, standardized Web site.

How many types of DOCTYPE HTML4.1, XHTML & HTML5 have?

HTML 4.01 & XHTML has three different <!DOCTYPE> declarations while HTML5 there is only one <!DOCTYPE> declaration.

HTML 4.01 Strict

In this DTD, all HTML elements and attributes are allowed except presentational or deprecated elements (like font). Framesets are also not allowed.

HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
HTML 4.01 Transitional

In this DTD, all HTML elements and attributes are allowed including presentational or deprecated elements (like font). But still Framesets are not allowed.

HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
HTML 4.01 Frameset

In this DTD, all HTML elements and attributes are allowed including presentational or deprecated elements (like font) along with frameset content.

HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN"
"http://www.w3.org/TR/html4/frameset.dtd">
XHTML 1.0 Strict

In this DTD, the markup must also be written as well-formed XML. All HTML elements and attributes are allowed except presentational or deprecated elements (like font). Framesets are also not allowed.

This DTD is equivalent to HTML4.01 Strict DTD but the markup must also be written as well-formed XML.

HTML
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
XHTML 1.0 Transitional

In this DTD, the markup must also be written as well-formed XML. All HTML elements and attributes are allowed including presentational or deprecated elements (like font). Framesets are also not allowed.

This DTD is equivalent to HTML4.01 Transitional DTD but the markup must also be written as well-formed XML.

HTML
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
XHTML 1.0 Frameset

This DTD is equal to XHTML 1.0 Transitional, but allows the use of frameset content.

HTML
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
XHTML 1.1

This DTD is equal to XHTML 1.0 Strict, but allows you to add modules (for example to provide ruby support for East-Asian languages).

HTML
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
HTML 5
HTML
<!DOCTYPE html>

How "<!DOCTYPE>" affects the rendering of HTML elements on different browsers?

Different browsers render the different tags differently. Whenever we define a DOCTYPE, it means we are telling to the browser that HTML Standards of the specified DOCTYPE have been used.

When I think about the past, then this really makes me smile that I had been so frustrated sometimes that few of the web pages of an application open in "Quirk" browser mode while some open in "Standard" browser mode. While exploring about "DOCTYPE", I came to know that only the correct "DOCTYPE" is responsible for opening the webpage in the correct browser mode as well.

How does a wrong "<!DOCTYPE>" make an HTML page invalid?

Defining a wrong DOCTYPE makes a Web page invalid. E.g., while developing a page if someone has mentioned the DOCTYPE as Strict and still he uses the deprecated element like "font", then this element makes that page invalid or let us say we have used <img> tag and we have not defined "Alt" attribute to this tag, then also it become an invalid page because as per Strict DTD "Alt" attribute is mandatory to define in <img> tag.

How to verify if a page written is valid or not?

W3C has a website which allows you to validate your webpages against defined "<!DOCTYPE>".

This website allows 3 ways to validate a webpage:

  1. Validate by URI which allows validating a webpage online. If your website is available publicly, then the webpages can be validated by providing the URL directly in the website.
  2. Validate by File Upload which allows validating a webpage by uploading the page on the website.
  3. Validate by direct input which allows validating a webpage by copying the markup in the provided area in the website.

W3C Page Validator

W3C Page Validator

How should we decide on defining the type of "<!DOCTYPE>"?

While defining the "<!DOCTYPE>", the big question comes to our mind that how should we decide that what type of DOCTYPE (DTD) we should define. Well, it’s quite simple.

Transitional DTD gives us a transition point between old style HTML and standard HTML 4.01 so whenever we have old written webpages and want them to make compatible with the latest browser with much hassle we can go for "Transitional" while if we are developing new web page, then Strict must be used.

How "<!DOCTYPE>" is related to document’s mode (Standard & Quirks) and how does a browser decide about rendering of a web page in Standard mode or Quirks mode?

The browser will figure out that you’re not really writing HTML 4.01 and go back to quirks mode. And then, you’re back to the problem of having the various browsers handle your page in different ways. The only way you can get predictable results is to tell the browser you’re using "HTML 4.01" and to actually do so. As web is becoming a standard day by day and all browsers are willing to support the standards only hence we should use the Standard document mode which can be forced by using <!DOCTYPE html>.

While exploring about the DOCTYPE, I came across one of OReilly’s Head First book which had an interesting Interview between Browser and Head First. I would like to share that Interview here which is quite interesting for me to clarify the topic.

Head First Interview with Browser

Head First Interview with Browser

In this article, we have learnt about <!DOCTYPE> and Document’s mode which are important to render a page consistently in all the browsers. It is always recommended that we use instruct the web page to use the Standard document mode and always define a <!DOCTYPE> in the page to avoid any compatibility issues.

I would appreciate if you could let me know your feedback about the article content, presentation and coverage which will help me to improve in my future articles.

Happy learning…

References

Filed under: CodeProject, HTML, Web Development
Image 4 Image 5 Image 6 Image 7 Image 8 Image 9 Image 10 Image 11

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)