Table of Contents
The symbol
returns the reader to the top of the Table of Contents.
1. Introduction
This article is a revision that:
- Removes test cases from this article, tool, and the downloads
- Adds a feature that recalls the last directory visited
- Removes the generated "name" attribute
- Removes the generated "px" units on <img> width and height
attributes
The last two revisions were made to adhere to the
HTML5 standard
[^].
This article revises the HTML authoring tool, HTML TOC Generator, that
generates a Table of Contents for an HTML document. Optionally, the
tool will number the HTML headers.
2. Purpose
The HTML TOC Generator tool performs modifications to a source HTML
document as directed by the contents of a <div> element with the
class "toc". There are three distinct modes of operation:
generation, removal, and numbering.
In generation mode, the tool
- Generates a Table of Contents (TOC) for the
HTML <h2> to
<h6> Tags
[^] appearing in an HTML document.
- Allows specifying which tags are to be included in the TOC.
- Allows the specified tags to be non-contiguous (e.g., "h2,h3,h5"
would generate a table of contents for HTML headings at levels, 2, 3,
5 - skipping h4).
- Allows specifying a heading for the TOC and the HTML heading level to
be used for the TOC heading.
- Allows specifying whether or not a link back to the TOC is desired
(such a link allows a reader to return to the TOC from each HTML
heading included in the TOC). If a return link is desired, allows
specifying an image to be placed in the link.
- The TOC will be placed within the HTML document where the TOC
<div> element is located.
In removal mode, the tool
- Removes all HTML previously generated by the HTML TOC Generator from
the HTML document.
- Removes the TOC
<div> element from the HTML document.
In numbering mode, the tool
performs the same actions as in generation
mode but, in addition, generates heading numbering.
3. TOC-div Element
The TOC-div element specifies the desired contents of the TOC as well
as the placement of the TOC within the HTML document. The TOC-div
element effects the HTML output in both the generation and
numbering modes. It is ignored in the removal mode.
In its simplest form, the TOC-div element takes the form:
<div class="toc"></div>
The generated TOC will be placed within the HTML document where the
TOC-div element is placed. The generated TOC replaces the TOC-div
element.
The choice of the "toc" class was intentional. With this class, the
style associated with the TOC can be specified in CSS. In the event
that the CSS does not define such a class, the following can be placed
in the <head> of the HTML document.
<style type="text/css">
.toc
{
}
.toc-generated
{
}
</style>
The "toc-generated" class is discussed below. Note that neither class
needs to be defined for the HTML TOC Generator to execute.
The format of the TOC-div element, in a modified BNF, is:
TOC-div ::= <div class="toc"
[style="[toc-headers[:<heading-tags-list>];]
[toc-return[:(true|false)];]
[toc-title[:<title-of-toc>];]
[toc-image[:<path-to-image>];]
[toc-image-width[:<width-in-pixels>];]
[toc-image-height[:<height-in-pixels>];]
[toc-header-level[:<heading-tag>];]
[toc-numbering[:<level-list>];]"]>
</div> .
heading-tags-list ::= heading-tag
::= heading-tag, heading-tags-list .
heading-tag ::= "h2"
::= "h3"
::= "h4"
::= "h5"
::= "h6" .
level-list ::= level-value
::= level-value, level-list .
level-value ::= [heading-tag] digit .
digit ::= "0"
::= "1"
::= "2"
::= "3"
::= "4"
::= "5"
::= "6"
::= "7"
::= "8"
::= "9" .
3.1. TOC-div Attributes
The two attributes of the TOC-div element are "class" and "style".
3.1.1. class
The class attribute is required and must have the attribute value of
"toc". Although the attribute value is case-insensitive, the
value should be lowercase (as recommended by W3C).
3.1.2. style
The style attribute is optional and contains, as its properties, the
desired contents of the TOC. If the attribute is omitted, the
following default TOC-div element will be used:
<div class="toc"
style="toc-headers:h2,h3,h4,h5,h6;
toc-return:true;
toc-title=Table of Contents;
toc-image:/app_themes/codeproject/img/gototop16.png;
toc-image-width:16;
toc-image-height:16;
toc-header-level:h2;">
</div>
Note that property names are separated from their property values by a
colon (":") and that a semicolon (";") separates
properties from one another. When multiple property values are
supplied (as in toc-headers, above), they are separated from
one another by a comma (",").
3.2. TOC-div style Properties
The TOC-div element style properties control what is generated by the
HTML TOC Generator. As shown above, the
style attribute and its properties can be omitted. However, by using
the style properties, significant control over the contents of the
generation of the TOC can be asserted.
3.2.1. toc-headers
toc-headers specifies which HTML headings tags will
generate an entry in the TOC. Note that HTML <h1> tags are never
processed by the HTML TOC Generator.
The toc-headers property may be omitted, and if omitted, entries for
all HTML headings tags, appearing in the HTML document, will be placed
in the TOC. Likewise, if the toc-headers property is present, but the
heading-tags-list is omitted, entries for all HTML headings tags,
appearing in the HTML document, will be placed in the TOC.
The heading-tags-list is composed of one or more of "h2",
"h3", "h4", "h5", or "h6", in
any order, in any case, separated by commas. White-space within the
heading-tags-list is ignored. An empty heading-tags-list is treated
as if the heading-tags-list was omitted and entries for all HTML
headings tags, appearing in the HTML document, will be placed in the
TOC. Unrecognized or duplicate values within the heading-tags-list
are ignored.
An example of a heading-tags-list is "h3,H5,h 2,foo,h4bar,h8,h2".
For this example, TOC entries will be generated for the HTML headings
tags <h2>, <h3>, and <h5>. "foo",
"h4bar", and "h8" will be ignored. The entry
"h 2" will be recognized as "h2" and the
duplicate "h2" will be ignored. The entry "H5"
will be modified to "h5". Even though the entries in the
heading-tags-list are unordered, the TOC entries will be ordered.
During the processing of HTML headings, HTML text formatting tags are
retained. These include:
Tag | Description |
<b> | Bold text |
<del> | Deleted text |
<em> | Emphasized text |
<i> | Italicized text |
<ins> | Inserted text |
<mark> | Marked/highlighted text |
<small> | Smaller text |
<strong> | Important text |
<sub> | Subscripted text |
<sup> | Superscripted text |
To more fully understand the HTML TOC Generation processing, some
basic terms used in this article need to be defined.
All HTML elements are considered to start at the opening "<" in its
tag and are considered to end at the closing ">" in its closing tag.
The content of an HTML element is considered to start immediately
following the closing ">" in its opening tag and is considered to
end immediately before the opening "<" in its closing tag
During its initial processing of <h?> and <div> elements,
HTML TOC Generator removes any previously generated elements with the
class "toc-generated". For example, if the previous example header had
been processed, it might have the following form.
The "toc_bookmark_1" bookmark is the target of the entry in the TOC
for this heading. The "toc-generated" class is the signal to the
HTML TOC Generator that this element is to be removed during any
removal process.
The href in the second <a> element points to the TOC. The
<img> element is present because the user did not specify a
toc-return-image and so the default was used. Again, the
"toc-generated" class is the signal to remove this element.
When all initial processing is completed, the heading element will
appear as the original heading, above.
3.2.2. toc-return
toc-return specifies whether or not a return link to the
TOC will be placed in the content of the HTML headings tag. Such a
return link allows a reader to return to the TOC from locations
within the document. The recognized property values are
"true" and "false".
If the toc-return property is omitted or if the toc-return property is
present but a property value is not or if the toc-return property is
present but an unrecognized property value is supplied, return links
to the TOC will be placed in the InnerHtml of the HTML headings tags
and a bookmark, named "toc_return_to_toc" will be placed in
the TOC.
3.2.3. toc-title
toc-title specifies the title for the TOC. If toc-title
is missing, the title "Table of Contents" will prefix the
TOC.
The value of toc-title may contain any alphanumeric
character plus any of the following characters:
Tilde (~)
Exclamation mark (!)
Number sign (#)
Dollar sign ($)
Percent sign (%)
Circumflex accent (^)
Ampersand (&)
Asterisk (*)
Left parenthesis (()
Right parenthesis ())
Underscore (_)
Plus sign (+)
Grave accent (`)
Hyphen (-)
Equals sign (=)
Left bracket ([)
Right bracket (])
Vertical line (|)
Semicolon (;)
Colon (:)
Greater-than symbol (>)
Question mark(?)
Comma (,)
Period (.)
Space ( )
Any other character will be removed from toc-title. If, after
processing, an empty string results, no TOC title will be generated.
3.2.4. toc-image
toc-image specifies the path to an image that will be placed in
the return link to the TOC in the text of the HTML headings tag. If
toc-image or the toc-image property value is missing,
the path
"/app_themes/codeproject/img/gototop16.png"
will be inserted into the TOC. The value of the property may contain
any valid path character. No test is made to insure that a valid path
is provided.
The default path is defined specifically for Code Project articles.
Documents for which a TOC is generated, but will not be published at
Code Project, should have a toc-image specified. The image path
must be "visible" to the HTML document.
See the discussion, below.
3.2.5. toc-image-width
toc-image-width specifies the width of the image that will be
placed in the return link to the TOC in the text of the HTML
headings tag. If toc-image-width or the toc-image-width
property value is missing, the toc-image-width defaults to 16
pixels. Note that units are not supplied. In HTML5, the width
attribute specifies the width of the image, in pixels.
3.2.6. toc-image-height
toc-image-height specifies the height of the image that will be
placed in the return link to the TOC in the text of the HTML headings
tag. If toc-image-height or the toc-image-height
property value is missing, the toc-image-height defaults to 16
pixels. Note that units are not supplied. In HTML5, the height
attribute specifies the height of the image, in pixels.
3.2.7. toc-header-level
toc-header-level specifies the HTML header level that
will be used to display the TOC title. If the toc-header-level
property is missing, the HTML header level "h2" will be used
for the TOC title element.
The value of the property may be any of the HTML header levels
"h2", "h3", "h4", "h5", or
"h6". Any other value will be ignored and the
toc-header-level value will become "h2".
3.2.8. toc-numbering
toc-numbering provides for the insertion of heading
numbering within the HTML document. If toc-numbering is
missing, heading numbering will not be inserted into the HTML
document.
If toc-numbering is present but the toc-numbering
property value is missing, heading numbering of all headers will be
inserted into the HTML document using a level-list of
"h21,h31,h41,h51,h61". This level-list will produce the following
heading numbering:
1. H2 heading
1.1. H3 heading
1.1.1. H4 heading
1.1.1.1. H5 heading
1.1.1.1.1. H6 heading
If a large HTML document is broken into separate HTML documents, by
using a level-list that differs from one HTML document to the next,
heading numbering can be made continuous across the separate HTML
documents.
For example, a portion of a large HTML document is:
<div class="toc"
style="toc-numbering;"></div>
<h2>Heading 1</h2>
:
large amount of HTML
:
<h2>Heading 2</h2>
:
large amount of HTML
:
<h2>Heading 3</h2>
:
large amount of HTML
:
Because the HTML generated text between the individual h2 elements is
too large to fit into the desired page size, the HTML document will be
broken into smaller HTML documents at the h2 header levels. However,
heading numbering is desired to be continuous across all pieces of the
document. By modifying the level-list property for each of the smaller
HTML documents, a continuous header numbering can be achieved.
<div class="toc"
style="toc-numbering:h21;"></div>
<h2>Heading 1</h2>
:
large amount of HTML
:
<div class="toc"
style="toc-numbering:h22;"></div>
<h2>Heading 2</h2>
:
large amount of HTML
:
<div class="toc"
style="toc-numbering:h23;"></div>
<h2>Heading 3</h2>
:
large amount of HTML
:
Any level (h2 through h6) can have its starting level number
specified.
4. Generated TOC
4.1. Generation
The following discussion assumes that the following TOC-div element is
found in an HTML document being submitted to the HTML TOC Generator:
<div class="toc">
</div>
The TOC-div element will be rewritten to display the properties used
during the processing of the HTML document. The toc-title is placed in the
default header level entry (in this case <h2>) in the contents
of the rewritten TOC-div element. The toc-title is assigned to the class
"toc-generated". Following the title will appear a generated
<div> that will contain the actual TOC. This <div> is
assigned to the class "toc-generated". The first entry in
this <div> will be the TOC bookmark
("toc_return_to_toc") used to return to the TOC from
locations within the HTML document. Immediately following will be the
opening tag for the unordered list that comprises the actual TOC.
So far, the generated TOC-div element would appear as
<div class="toc"
style="toc-headers:h2,h3,h4,h5,h6;
toc-return:true;
toc-title:Table of Contents;
toc-return-image:/app_themes/codeproject/img/gototop16.png;
toc-image_width:16;
toc-image_height:16;
toc-header-level:h2;" >
<h2 class="toc-generated">Table of Contents</h2>
<div class="toc-generated">
<a id="toc_return_to_toc" </a>
<ul>
Totally dependent upon the contents of the document, a generated TOC
entry is created using <li> elements embedded within, possibly
nested, <ul> elements. Nesting occurs when a subordinate heading
level is encountered. Given the following <h2> tag:
<h2>Introduction</h2>
the following entry will be generated in the TOC
<li><a href="#toc_bookmark_1">Introduction</a></li>
and the <h2> element will be modified to
<h2>Introduction
<a id="toc_bookmark_1"
class="toc-generated" >
</a>
<a href="#toc_return_to_toc"
class="toc-generated" >
<img alt="Table of Contents"
title="Table of Contents"
src="/app_themes/codeproject/img/gototop16.png"
width="16"
height="16" />
</a>
</h2>
Any existing bookmark or link with a class of
"toc-generated" will be removed. In the preceding example,
all that would remain before regeneration, would be:
<h2>Introduction</h2>
The bookmark ID attribute value (i.e., " toc_bookmark_1") is
generated by the HTML TOC Generation process and will be unique within
the HTML document (that is assuming that no pathological case exists
wherein the input HTML contains the generated value). In this example,
an image link is also generated that, when clicked, will return the
reader to the top of the TOC.
4.2. Numbering
If toc-numbering is specified, as in:
<div class="toc"
style="toc-numbering;"
</div>
then the initial part of the TOC-div element will be generated as:
<div class="toc"
style="toc-headers:h2,h3,h4,h5,h6;
toc-return:true;
toc-title:Table of Contents;
toc-return-image:/app_themes/codeproject/img/gototop16.png;
toc-image_width:16;
toc-image_height:16;
toc-header-level:h2;
toc-numbering:h21,h31,h41,h51,h61;" >
<h2 class="toc-generated">Table of Contents</h2>
<div class="toc-generated">
<a id="toc_return_to_toc" </a>
<ul>
and if the second <h2> element is
<h2>Introduction</h2>
the following entry will be generated in the TOC
<li><a href="#toc_bookmark_1">2. Introduction</a></li>
and the <h2> element contents will be modified to
<h2><span class="toc-generated" >2. </span>Introduction
<a id="toc_bookmark_1"
class="toc-generated" >
</a>
<a href="#toc_return_to_toc"
class="toc-generated" >
<img alt="Table of Contents"
title="Table of Contents"
src="/app_themes/codeproject/img/gototop16.png"
width="16"
height="16" />
</a>
</h2>
4.3. Example
If the following HTML document is submitted to the HTML TOC
Generator:
<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="Content-type" content="text/html;charset=UTF-8" />
<title>Test Auto TOC Generation</title>
<link type="text/css"
rel="stylesheet"
href="http://s.codeproject.com/App_Themes/CodeProject/Css/Main.min.css?dt=2.6.130426.1" />
<style type="text/css">
.toc
{
}
.toc-generated
{
}
</style>
</head>
<body style="margin: 20px;">
<div class="toc">
</div>
<h2>Header Level <b>2</b> - <i>Number 1</i></h2>
<p>H2 1</p>
<h3>Header Level 3 - Number 1</h3>
<p>H3 1</p>
<h4>Header Level 4 - Number 1</h4>
<p>H4 1</p>
<h4>Header Level 4 - Number 2</h4>
<p>H4 2</p>
<h5>Header Level 5 - Number 1</h5>
<p>H5 1</p>
<h6>Header Level 6 - Number 1</h6>
<p>H6 1</p>
<h4>Header Level 4 - Number 3</h4>
<p>H4 3</p>
<h3>Header Level 3 - Number 2</h3>
<p>H3 2</p>
<h2>Header Level 2 - Number 2</h2>
<p>H2 2</p>
</body>
</html>
the HTML document that would be generated is:
<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="Content-type" content="text/html;charset=UTF-8" />
<title>Test Auto TOC Generation</title>
<link type="text/css"
rel="stylesheet"
href="http://s.codeproject.com/App_Themes/CodeProject/Css/Main.min.css?dt=2.6.130426.1" />
<style type="text/css">
.toc
{
}
.toc-generated
{
}
</style>
</head>
<body style="margin: 20px;">
<div class="toc"
style="toc-headers:h2,h3,h4,h5,h6;
toc-return:true;
toc-title:Table of Contents;
toc-return-image:/app_themes/codeproject/img/gototop16.png;
toc-image_width:16;
toc-image_height:16;
toc-header-level:h2;" >
<h2 class="toc-generated">Table of Contents</h2>
<div class="toc-generated">
<a id="toc_return_to_toc"> </a>
<ul>
<li><a href="#toc_bookmark_0">Header Level <b>2</b> - <i>Number 1</i></a>
<ul>
<li><a href="#toc_bookmark_1">Header Level 3 - Number 1</a>
<ul>
<li><a href="#toc_bookmark_2">Header Level 4 - Number 1</a></li>
<li><a href="#toc_bookmark_3">Header Level 4 - Number 2</a>
<ul>
<li><a href="#toc_bookmark_4">Header Level 5 - Number 1</a>
<ul>
<li><a href="#toc_bookmark_5">Header Level 6 - Number 1</a></li>
</ul>
</li>
</ul>
</li>
<li><a href="#toc_bookmark_6">Header Level 4 - Number 3</a></li>
</ul>
</li>
<li><a href="#toc_bookmark_7">Header Level 3 - Number 2</a></li>
</ul>
</li>
<li><a href="#toc_bookmark_8">Header Level 2 - Number 2</a></li>
</ul>
<p>
The symbol
<a href="#toc_return_to_toc">
<img alt="Table of Contents"
title="Table of Contents"
src="/app_themes/codeproject/img/gototop16.png"
width="16"
height="16" />
</a>
returns the reader to the top of the Table of Contents.
</p>
</div>
</div>
<h2>Header Level <b>2</b> - <i>Number 1</i>
<a id="toc_bookmark_0"
class="toc-generated" >
</a>
<a href="#toc_return_to_toc"
class="toc-generated" >
<img alt="Table of Contents"
title="Table of Contents"
src="/app_themes/codeproject/img/gototop16.png"
width="16"
height="16" />
</a>
</h2>
<p>H2 1</p>
<h3>Header Level 3 - Number 1
<a id="toc_bookmark_1"
class="toc-generated" >
</a>
<a href="#toc_return_to_toc"
class="toc-generated" >
<img alt="Table of Contents"
title="Table of Contents"
src="/app_themes/codeproject/img/gototop16.png"
width="16"
height="16" />
</a>
</h3>
<p>H3 1</p>
<h4>Header Level 4 - Number 1
<a id="toc_bookmark_2"
class="toc-generated" >
</a>
<a href="#toc_return_to_toc"
class="toc-generated" >
<img alt="Table of Contents"
title="Table of Contents"
src="/app_themes/codeproject/img/gototop16.png"
width="16"
height="16" />
</a>
</h4>
<p>H4 1</p>
<h4>Header Level 4 - Number 2
<a id="toc_bookmark_3"
class="toc-generated" >
</a>
<a href="#toc_return_to_toc"
class="toc-generated" >
<img alt="Table of Contents"
title="Table of Contents"
src="/app_themes/codeproject/img/gototop16.png"
width="16"
height="16" />
</a>
</h4>
<p>H4 2</p>
<h5>Header Level 5 - Number 1
<a id="toc_bookmark_4"
class="toc-generated" >
</a>
<a href="#toc_return_to_toc"
class="toc-generated" >
<img alt="Table of Contents"
title="Table of Contents"
src="/app_themes/codeproject/img/gototop16.png"
width="16"
height="16" />
</a>
</h5>
<p>H5 1</p>
<h6>Header Level 6 - Number 1
<a id="toc_bookmark_5"
class="toc-generated" >
</a>
<a href="#toc_return_to_toc"
class="toc-generated" >
<img alt="Table of Contents"
title="Table of Contents"
src="/app_themes/codeproject/img/gototop16.png"
width="16"
height="16" />
</a>
</h6>
<p>H6 1</p>
<h4>Header Level 4 - Number 3
<a id="toc_bookmark_6"
class="toc-generated" >
</a>
<a href="#toc_return_to_toc"
class="toc-generated" >
<img alt="Table of Contents"
title="Table of Contents"
src="/app_themes/codeproject/img/gototop16.png"
width="16"
height="16" />
</a>
</h4>
<p>H4 3</p>
<h3>Header Level 3 - Number 2
<a id="toc_bookmark_7"
class="toc-generated" >
</a>
<a href="#toc_return_to_toc"
class="toc-generated" >
<img alt="Table of Contents"
title="Table of Contents"
src="/app_themes/codeproject/img/gototop16.png"
width="16"
height="16" />
</a>
</h3>
<p>H3 2</p>
<h2>Header Level 2 - Number 2
<a id="toc_bookmark_8"
class="toc-generated" >
</a>
<a href="#toc_return_to_toc"
class="toc-generated" >
<img alt="Table of Contents"
title="Table of Contents"
src="/app_themes/codeproject/img/gototop16.png"
width="16"
height="16" />
</a>
</h2>
<p>H2 2</p>
</body>
</html>
5. TOC Removal
When the TOC removal process is invoked against an HTML document that
was processed by the HTML TOC Generator, all generated HTML will be
removed. This includes the TOC-div element as well as all elements with the
class "toc-generated".
6. Implementation
The HTML TOC Generator is encapsulated in the HTMLTOCGenerator.cs
file. Two entry points provide the services:
add_TOC_to_html and
remove_TOC_from_html. Both
take a single string argument that is the HTML upon which to execute.
Both return a string that contains the possibly revised HTML.
The using directives for HTMLTOCGenerator.cs are:
using System;
using System.Collections.Generic;
using System.Text;
using CONST = HTMLTOCGenerator.Constants;
using DATA = HTMLTOCGenerator.Data;
using ELEMENT = HTMLTOCGenerator.Element;
using HTMLPARSER = HTMLTOCGenerator.HTMLParser;
using NUMBERING = HTMLTOCGenerator.TOCNumbering;
using TOC = HTMLTOCGenerator.TOCDIV;
using TYPE = HTMLTOCGenerator.Constants.Element_Type;
The two methods are:
public static string add_TOC_to_html ( string html )
{
HTMLPARSER HTML_parser = new HTMLPARSER ( );
string rewriten_html = html;
HTML_parser.collect_all_desired_elements ( html );
if ( TOC.HaveTOCDIV )
{
HTML_parser.revise_element_content ( );
HTML_parser.eliminate_unwanted_elements ( );
rewriten_html = rewrite_html ( html );
}
return ( rewriten_html );
}
public static string remove_TOC_from_html ( string html )
{
HTMLPARSER HTML_parser = new HTMLPARSER ( );
int html_start = 0;
int html_to_copy = 0;
StringBuilder sb = new StringBuilder ( );
HTML_parser.collect_all_desired_elements ( html );
HTML_parser.revise_element_content ( );
foreach ( ELEMENT element in DATA.Elements )
{
html_to_copy = element.ElementStartsAt -
html_start - 1;
if ( html_to_copy > 0 )
{
sb.Append ( html, html_start, html_to_copy );
}
if ( element.ElementType == TYPE.TOCDIV )
{
}
else
{
sb.AppendFormat ( "\n<{0}>{1}</{0}>\n",
element.TagName,
element.Content );
}
html_start = element.ElementEndsAt + 1;
}
html_to_copy = html.Length - html_start;
sb.Append ( html, html_start, html_to_copy );
return ( sb.ToString ( ) );
}
The HTML parser is encapsulated in the class HTMLParser. The parser
was originally developed by Jeff Heaton and is available as a
C# Parser
[^]. Major modifications were made to the
parser so that it was self-limiting to <h2>, <h3>,
<h4>, <h5>, <h6>, and <div> elements.
The generation, numbering, and removal processes
make an single pass through the HTML. When all of the desired headers
and <div>s have been identified, the HTML is copied to the
output.
For the generation and numbering modes to modify source
HTML, a
TOC-div element must be located within the source HTML. If that
element is not found, then the source HTML is returned unmodified.
An invoking program can determine if this occurred by testing the
lengths of the source and the returned HTML. If the lengths are the
same, a TOC-div element was not found and no modifications were made.
The removal process does not require that a TOC-div
element exist. It seeks all header elements containing
<a> or <span> elements with the class of "toc-generated".
It then removes these elements. It also removes any existing TOC-div
element that it finds.
The copy process jumps through the HTML, guided by the collected
header and <div> data. This is demonstrated by the
remove_TOC_from_html source code, above.
I am considering replacing the HTML parser from one that uses an
indexed buffer to one that uses
StringReader
[^]. The advantage to StringReader is its
look ahead cabability (i.e., Peek method). The disadvantage is the
time needed to implement the revision.
7. HTML TOC Generator Tool
Although the HTML TOC Generator Tool was designed to test the two
methods add_TOC_to_html and remove_TOC_from_html,
because it produces useful HTML, it is included in the downloads for
this article.
The images supplied in this section are thumbnails. By clicking on the
image, an expanded image can be viewed.
7.1. HTML TOC Generator Tool Startup
Input to the tool is made through the RichTextBox in the HTML Input
tab. There are two ways in which input can be provided:
1.
| Directly copying HTML into the RichTextBox.
|
2.
| Choosing an HTML file by using the Browse button.
|
7.2. HTML TOC Generator Tool Input Phase
When the HTML Input tab RichTextBox contents have been supplied, the
Generate button appears. In the event that the HTML contains the
string "toc-generated" a Remove button will also appear.
In the example above, the Browse button was used to obtain the
HTML for this article. Note that near the bottom, a TOC-div element is
defined. Note too that the Remove button is visible even though TOC
generation has not occurred. This happened because this article
contains the string "toc-generated".
The contents of the HTML Input tab RichTextBox may be modified before
the Generate button is clicked.
7.3. HTML TOC Generator Tool Creating TOC
The tool is not re-entrant. So once the Generate button is clicked,
its visibility will be set to false. To apply the tool against another
HTML file, it is necessary to re-execute the tool.
When the Generate button is clicked, the add_TOC_to_html method
is invoked against the contents of the HTML Input tab RichTextBox. The
results of its execution are placed in the Revised HTML tab
RichTextBox.
In the example above, all headers have been modified. In
addition (although not visible), the TOC-div element has been replaced
as described above.
Navigation between the two tabs is supported.
If desired, the revised HTML may be saved. This is achieved by
clicking on the Save button and completing the Save File Dialog. For
ease of use, a filename is proposed for the save operation. It is
constructed from the original input filename, with ".TOC" inserted
after the input filename and before the extension.
7.4. HTML TOC Generator Tool TOC Removal
The removal process operates in much the same way as generation
and numbering. An HTML file is chosen and, if "toc-generated" is
found in the document, a "Remove" button is displayed. When clicked,
all HTML TOC generated elements are removed. Also the TOC-div element
is removed.
8. Return to TOC Image
For an HTML document that will not be published by Code Project, a
graphic, named ReturnToToc.png, that could be used for the toc-image
is included in the download, in the HTMLTOCGeneratorDialogProject ZIP.
The image is 31 x 31 pixels. I recommend that the width and height be
set to 16 (as was done for the image to the left). There are no
copyright restrictions on the image.
9. Conclusion
This article has presented revisions to an HTML authoring tool that
generates a Table of Contents for an HTML document. Additionally, the
tool can be directed to produce numbered HTML hearers.
10. References
11. Development Environment
The HTML TOC Generator was developed in the following environment:
Microsoft Windows 7 Professional Service Pack 1
|
Microsoft Visual Studio 2008 Professional
|
Microsoft .Net Framework Version 3.5 SP1
|
Microsoft Visual C# 2008
|
12. History
08/22/2017
| HTML TOC Generator V4.1
|
04/10/2015
| Original article
|