Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

HtmlHelp library and example viewer

0.00/5 (No votes)
11 Aug 2004 3  
A class library for reading compiled HTML help (chm) files and a sample viewer application using this library.

Building table of contents directly out of the CHM file and using a class library internal user control to display it.

Table of contents

Building the help keyword index from text-based hhk-index files or internally stored binary index formats. The class library also contains a user control for displaying the keyword index.

Keyword index

Implementing a full-text search engine which uses the CHMs' internally stored full-text index. As above, the class library contains a user control for easy full-text search integration.

Fulltext search

Support for CHMs compiled with internaltional languages like Russian, Hebrew, etc.

Fulltext search

Contents

  1. Contents
  2. Introduction
  3. Reading CHM-Files
  4. Files of the HtmlHelp system
  5. The internal "file system" and its files
  6. Using the class library
  7. Useful links
  8. Conclusion
  9. History

2. Introduction

First of all, sorry for my bad English. Its not my native language :)

The article is about a class library for reading CHM (Microsoft compiled HTML help 1.0/1.1) files. With the use of this library, you can easily integrate a help system in your application without using the Microsoft UI. The demo project will teach you, how to embed the Microsoft Web-browser control and how to interact with the library.

I've created this library during my last project where I had to create a Windows application with a fully integrated help system (help browser window embedded into the application, table of contents, index and search panes like in Visual Studio). Since the default help providing tools which are shipped with VS.NET only work with the Microsoft UI (default Microsoft HTMLHelp viewer), I had to find some other way how to serve a commonly known help system to the users of my application. Searching the net was kind of frustrating, because there was no managed library or sample code available for handling CHM files and their contents. But I've found other resources and articles which made it possible for me to implement my own managed library (see useful links at the end of this article).

Note: The class libary is not performance optimized. I'm using a lot of .NET classes which are not very efficient but easy to use (ArrayList, HashTable, HybridDictionary, RegEx, ...). You shouldn't get in troubles with small help files (small in the manner of table of contents and/or index size).

3. Reading CHM-Files

Basically, a CHM file contains its own file system. You can handle read/write streaming using the IStorage interface, which supports the creation and management of structured storage objects. I won't go deeper into the usage of IStorage and the wrapper creation because there is already an article called Decompiling CHM (help) files with C# available. Have a look at it if you need more information about IStorage and how to use it.

4. Files of the HtmlHelp system

In some cases, especially in bigger help systems, CHM is not the only file extension we are interested in. If your HtmlHelp file tends to become too big, the HTML Help Workshop (usually your help creation tool) offers possibilities to split the help system into multiple files with different kinds of contents which mainly are:

file extension file contents
CHM: the main help file contains all help content files (HTML files, images, etc.). This file can contain all the contents of the other files.
CHI: This file contains internal system files. The CHM help system contains system files (e.g. a string table file, an URL table file, ...). Those files may be stored separate in a file with the extension CHI (e.g. the help system contains a table of contents in binary format which needs a lot of space).
CHQ: This file contains the full-text search index of the help system. If this file doesn't exist, the full-text search index can be found in the CHM-file or full-text searching is disabled.
CHW: This file contains the help index (see index pane of your HtmlHelp-viewer) in binary format. If this file doesn't exist, the binary index can be found in the CHM-file if enabled. The help system allows a second format for storing the index beside the binary one. A small help index may also be stored in a sitemap-format in a file with the extension .hhk. Such a text-based index will always be stored as internal content file in the corresponding CHM-file storage.

The help index always consists of two different type of links:

  • Associative links (ALinks)

    This index maps for example control IDs with the appropriate help topic.

  • Keyword links (KLinks)

    This index maps keywords taken from the HTML documents with the appropriate help topic.

All these different files contain their own "file system" which can be read/written using the IStorage interface.

5.The internal "file system" and its files

As I already mentioned, the internal "file system" contains content files and system files. Point of our interest are the system files. I'll only talk about the necessary system files, because there are some system files which the library doesn't decode during the data extraction (no interesting content).

The following list of internal files gives a short overview about their names, formats, where they can be found and about their contents. The class library implements objects with similar names as the system files to make it more transparent. These classes are responsible for decoding the binary files or parsing text-based contents.

For a more detailed description on the different files, their contents and how to decode them, see Pabs unofficial HTML help specification.

So here we go:

5.1 The #SYSTEM file

Internal file name: #SYSTEM
Format: binary
Found in: CHM, CHI
Library class: HtmlHelp.CHMDecoding.CHMSystem

This file contains the main information about the help system such as: name of the sitemap contents file, name of the sitemap index file, the default help topic, a flag if full-text searching is supported, flags if the help system contains ALinks and/or KLinks, flags if the system has a binary table of contents and/or a binary index, the compiler version, and many more. All these information are stored in binary format and must be decoded directly from a binary stream.

5.2 The #IDXHDR file

Internal file name: #IDXHDR
Format: binary
Found in: CHM, CHI
Library class: HtmlHelp.CHMDecoding.CHMIdxhdr

This file mainly contains links (offsets) to the #STRINGS file such as: offset to the frame name, offset to the window name, offset to the image list, offset to merged files, number of topic nodes including the contents and index files, and some other flags.

This file has a fixed sizes of 4096 bytes and must be read binary.

5.3 The #STRINGS file

Internal file name: #STRINGS
Format: binary
Found in: CHM, CHI
Library class: HtmlHelp.CHMDecoding.CHMStrings

This file is a list of ANSI/UTF-8 NT (NIL terminator) strings. It contains all topic names, window names and other strings. The very first entry is just a NIL character allowing the help system to specify a zero offset and get a valid string. The internal strings are sliced up into blocks of 4096 bytes length.

5.4 The #TOCIDX file

Internal file name: #TOCIDX
Format: binary
Found in: CHM, CHI
Library class: HtmlHelp.CHMDecoding.CHMTocidx

This file only exists in files with a non-empty contents file, binary table of contents = true and compatibility = 1.1. It contains the binary table of contents for the help file in a tree format.

5.5 The #TOPICS file

Internal file name: #TOPICS
Format: binary
Found in: CHM, CHI
Library class: HtmlHelp.CHMDecoding.CHMTopics, HtmlHelp.CHMDecoding.TopicEntry

This file contains information on the topics present. It mainly stores offsets into the #TOCIDX (if binary table of contents is enabled), the #STRINGS and the #URLTBL files.

5.6 The #URLSTR file

Internal file name: #URLSTR
Format: binary
Found in: CHM, CHI
Library class: HtmlHelp.CHMDecoding.CHMUrlstr

This file contains URL strings and frame names. The URL string is always relative to the storage root.

5.7 The #URLTBL file

Internal file name: #URLTBL
Format: binary
Found in: CHM, CHI
Library class: HtmlHelp.CHMDecoding.CHMUrltable, UrlTableEntry

This file contains a URL table mapping topics to URLs. It mainly stores offsets into the #URLSTR file and an index into the #TOPICS file.

5.8 The $FIftiMain file

Internal file name: $FIftiMain
Format: binary
Found in: CHM, CHQ
Library class: HtmlHelp.CHMDecoding.FullTextEngine

This file stores information for the full-text search. So you do not have to search all content files and topics for words and phrases by your own. This speeds up searching considerably, since the index in this file contains data on which word occurs in which files and at which locations.

The file starts with a header. This is followed by index nodes, leaf nodes and word location codes (WCLs). The index and leaf nodes are a fixed size (set in the header) and the WCL entries are variable size (set in the leaf nodes).

The class library implements a FullTextEngine class which implements the following algorithm for searching:

Read the header, seek to the root index node, search the root index node for a word greater or equal to the desired, descend to the next index level, repeat the previous two steps as many times as the tree is deep, then search the resulting leaf node until the desired word is found, read the correct part of the WCLs for that leaf node and extract the topic numbers for that word.

See Pabs unofficial HTML help specification for detailed information about the full-text index file.

5.9 The $WWAssociativeLinks\BTree and $WWKeywordLinks\BTree file

Internal file name: BTree
(located in sub-storages $WWAssociativeLinks and/or $WWKeywordLinks)
Format: binary
Found in: CHM, CHI, CHW
Library class: HtmlHelp.CHMDecoding.CHMBtree

This file stores the binary index of the help system. Depending on the sub-storage, it contains the ALinks or KLinks of the help system. The file format is the same for both index types. The file contains two different types of entries (besides a header): Listing blocks and Index blocks. Decoding the listing blocks forms an index tree where sub-keywords are ", " separated (e.g. main item keyword "Dialog", sub items keywords "Dialog, About" or "Dialog, Find and Replace"). Each listing block has at least one Index block entry. More than one index block entry means this keyword can be found in multiple topics. "See Also" keywords are also stored in this file.

5.10 The Table of Contents file

Internal file name: specified in #SYSTEM
(if not available, search for "Table of contents.hhc" or "<chmname>.hhc" in
the storage's content files)
Format: text-based sitemap
Found in: CHM, CHI
Library class: HtmlHelp.CHMDecoding.HHCParser

This file contains the text-based table of contents of the help system. The library class uses regular expression parsing to build the table of contents tree.

A sample HHC file:

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<HTML>
<HEAD>
<meta name="GENERATOR" content="Microsoft� HTML Help Workshop 4.1">
<!-- Sitemap 1.0 -->
</HEAD><BODY>
<UL>
        <LI> <OBJECT type="text/sitemap">
                <param name="Name" value="Introduction to GraphEdit">
                <param name="Local" value="graphedit_help.htm">
                </OBJECT>
        <LI> <OBJECT type="text/sitemap">
                <param name="Name" value="Building Filter Graphs">
             </OBJECT>
        <UL>
             <LI> <OBJECT type="text/sitemap">
                     <param name="Name" value="Build a File Playback Graph">
                     <param name="Local" value="build_graph.htm">
                  </OBJECT>
        </UL>
</UL>
</BODY></HTML>

5.11 The Index file

Internal file name: specified in #SYSTEM
(if not available search for "Index.hhk" or "<chmname>.hhk" in the storage's
content files)
Format: text-based sitemap
Found in: CHM, CHI
Library class: HtmlHelp.CHMDecoding.HHKParser

This file contains the text-based index of the help system. The format is the same as in HHC files except that there is only one level allowed in the site map and the first Name entry specifies the keyword which is followed by Name, Local pairs.

5.12 Information types and categories

Information types and categories are only supported for CHMs using a text-based TOC or Index (by HtmlHelp Workshop). You can define information types in HtmlHelp Workshop and assign them to table of contents nodes and index entries. This allows a viewer to filter the contents which are displayed to the user.
e.g. you can define the information types "SDK Reference", "FAQ", "HOWTOs" etc. and the viewer can filter help contents depending on the user's selection. You can define categories and assign one or more information types to this category. e.g. define the categories "Beginner", "Intermediate", "Advanced", assign the previously created information types to the categories and the viewer can filter help contents depending on the user's skill level.

Information types and categories are stored in the .hhc and .hhk files.

For an information how the information types and categories are stored in HHC/HHK files see Table 5.56/5.59 at Pabs unofficial HTML help specification.

6. Using the class library

About 90% of the HtmlHelp class library classes are marked as internal, so you only have a little effort to get started.

6.1 HtmlHelp library/viewer UML

HtmlHelp UML(click to enlarge)

HtmlHelp namespace UML

This is the main namespace for you to work with. It contains the main class for all operations which is HtmlHelp.HtmlHelpSystem.

Use this class to load files, merge files, access the table of contents, access the index and perform full-text searches.


HtmlHelpViewer UML(click to enlarge)

This UML diagram visualizes the HtmlHelp Viewer demo application and how it uses the library.

In the center of this diagram, you can see the Viewer class. This class represents the main Viewer form class.

It instantiates the three library internal user controls (HtmlHelp.UIComponents.helpIndex, HtmlHelp.UIComponents.helpSearch, HtmlHelp.UIComponents.TocTree) and the main class HtmlHelp.HtmlHelpSystem.

6.2 The data dumping

I've introduced this feature in version 0.3. The problem of the library without this feature is, that it takes a huge amount of time to open CHM files with very large text-based table of content files. The reason for that is regular expression parsing of big .hhc files.

Using the dumping feature will save about 90%+ of the loading time in such scenarios. After the first time you load such a CHM the internal CHMFile class will create a data dump depending on an instance of the DumpingInfo class you have provided to the OpenFile() or MergeFile() methods.

If you load the same file a second time, the data dump will be used to load some of the CHM data (depends on your preferences).

Using the DumpingInfo class you can specify:

  • The output/input directory of the dump file,
  • The compression level of the dump file (uses ic#code's zip library),
  • And what kind of data should be dumped using the flags:
    • DumpingFlags.DumpTextTOC ... text based tocs should be dumped
    • DumpingFlags.DumpBinaryTOC ... binary tocs should be dumped
    • DumpingFlags.DumpTextIndex ... text based index should be dumped
    • DumpingFlags.DumpBinaryIndex ... binary index should be dumped
    • DumpingFlage.DumpStrings ... the contents of the #STRINGS file should be dumped
    • DumpingFlags.DumpUrlStr ... the contents of the #URLSTR file should be dumped
    • DumpingFlags.DumpUrlTbl ... the contents of the #URLTBL file should be dumped
    • DumpingFlags.DumpTopics ... the contents of the #TOPICS file should be dumped
    • DumpingFlags.DumpFullText ... the full-text index should be dumped

See the following code snippets of this chapter.

To see how efficient the dumping feature is, I've done a test with the following results:

Test done on: Intel Xeon 3.06GHz HT, 1GB Ram, WinXP, SCSI-HDs
DumpingFlags set: DumpingFlags.DumpBinaryTOC | DumpingFlags.DumpTextTOC | DumpingFlags.DumpTextIndex | DumpingFlags.DumpBinaryIndex | DumpingFlags.DumpUrlStr | DumpingFlags.DumpStrings
Dump comrpession: DumpCompression.Medium

DirectX9 SDK CHM (binary index and binary TOC):
Read time without dump: --- HtmlHelp file read in 00:00:02.1874784
Write time of dump data*: --- Dump written in 00:00:01.0937360 (dump file size: ~780KB)

Read time with dump: --- HtmlHelp file read in 00:00:00.7499904 (dump file size: ~780KB)
Net read time of dump: --- Dump read in 00:00:00.5781176 (dump file size: ~780KB)

CHM with a ~900KB sitemap TOC (binary index and text-based TOC):
Read time without dump: --- HtmlHelp file read in 00:00:05.7811760 (slow RegEx parsing :( )
Write time of dump data*: --- Dump written in 00:00:00.5156184 (dump file size: ~300KB)

Read time with dump: --- HtmlHelp file read in 00:00:00.3437456 (dump file size: ~300KB)
Net read time of dump: --- Dump read in 00:00:00.2187472 (dump file size: ~300KB)

* the write time of the dump is not included in the "Read time without dump" timespan.

I think the two examples above shows how the usage of the data dumping can speed up the loading process.

Another pro of using dump files is, that you will save a few MBs of memory, because only the necessary fields are stored and loaded from the dump file. Also the initial size of strings is known when reading the dump, so the .NET Framework can instantiate the string instances with an initial size.

6.3 Initializing the library

The main class for using this library is HtmlHelp.HtmlHelpSystem. Use this library to open or merge CHM files, get the table of contents or index etc.

First of all, you need to create an instance of the HtmlHelp.HtmlHelpSystem class. The provided sample viewer application instantiates the class in the constructor of the main form and saves the instance in a class variable (see Viewer.cs line 110 (contructor)).

private string LM_Key = @"Software\Klaus Weisser\HtmlHelpViewer\";
// The main HtmlHelpSystem class used for handling CHMs

HtmlHelpSystem _reader = null;
// a dumping info class managed by the viewer to specify the data-dumping 
// of the class library
DumpingInfo _dmpInfo=null; // a default information-type/category filter InfoTypeCategoryFilter _filter = new InfoTypeCategoryFilter(); // stores the preferenced dumping output directory string _prefDumpOutput=""; // preferenced dump data compression DumpCompression _prefDumpCompression = DumpCompression.Medium; // preferenced dump contents DumpingFlags _prefDumpFlags = DumpingFlags.DumpBinaryTOC |
DumpingFlags.DumpTextTOC | DumpingFlags.DumpTextIndex |
DumpingFlags.DumpBinaryIndex | DumpingFlags.DumpUrlStr | DumpingFlags.DumpStrings; // preferenced CHM-URL prefix string _prefURLPrefix = "mk:@MSITStore:"; // preferenced imagelist bool _prefUseHH2TreePics = false; /// <summary> /// Constructor of the class /// </summary> public Viewer() { // create a new instance of the classlibrary's main class _reader = new HtmlHelpSystem(); // if you don't set this option, the default prefix will be
// "mk-its:" which should work too
HtmlHelpSystem.UrlPrefix = "mk:@MSITStore:"; // use temporary folder for data dumping string sTemp = System.Environment.GetEnvironmentVariable("TEMP"); if(sTemp.Length <= 0) sTemp = System.Environment.GetEnvironmentVariable("TMP"); _prefDumpOutput = sTemp; // create a default dump info instance used for dumping data _dmpInfo = new DumpingInfo(DumpingFlags.DumpBinaryTOC |
DumpingFlags.DumpTextTOC | DumpingFlags.DumpTextIndex | DumpingFlags.DumpBinaryIndex | DumpingFlags.DumpUrlStr | DumpingFlags.DumpStrings, sTemp, DumpCompression.Medium); LoadRegistryPreferences(); // loads the preferences from the system's
// registry
HtmlHelpSystem.UrlPrefix = _prefURLPrefix; // overwrites the current
// CHM-URL prefix with the
// one from the registry
HtmlHelpSystem.UseHH2TreePics = _prefUseHH2TreePics; InitializeComponent(); } /// <summary> /// Loads viewer preferences from registry. This is viewer specific ! /// </summary> private void LoadRegistryPreferences() { RegistryKey regKey = Registry.LocalMachine.CreateSubKey(LM_Key); bool bEnable = bool.Parse(regKey.GetValue("EnableDumping",true).ToString()); _prefDumpOutput = (string) regKey.GetValue("DumpOutputDir",_prefDumpOutput); _prefDumpCompression = (DumpCompression)
((int)regKey.GetValue("CompressionLevel", _prefDumpCompression)); _prefDumpFlags = (DumpingFlags) ((int)regKey.GetValue("DumpingFlags",
_prefDumpFlags)); if(bEnable) _dmpInfo = new DumpingInfo(_prefDumpFlags, _prefDumpOutput,
_prefDumpCompression); else _dmpInfo = null; _prefURLPrefix = (string) regKey.GetValue("ITSUrlPrefix", _prefURLPrefix); _prefUseHH2TreePics = bool.Parse(regKey.GetValue("UseHH2TreePics",
_prefUseHH2TreePics).ToString()); }

6.4 Open a CHM file

To open a file, use the following lines of code (see Viewer.cs line 1088):

// clear current items

tocTree1.ClearContents();
helpIndex1.ClearContents();
helpSearch2.ClearContents();

// open the chm-file selected in the OpenFileDialog

// if _dmpInfo == null, the dumping feature will be disabled

_reader.OpenFile( openFileDialog1.FileName, _dmpInfo  );

// Enable the toc-tree pane if the opened file has a table of contents

tocTree1.Enabled = _reader.HasTableOfContents;
// Enable the index pane if the opened file has an index

helpIndex1.Enabled = _reader.HasIndex;
// Enable the full-text search pane if the

// opened file supports full-text searching

helpSearch2.Enabled = _reader.FullTextSearch;

// ...


// Build the table of contents tree view in the classlibrary control

// _filter is the information-type/category filter instantiated in the 
// constuctor
tocTree1.BuildTOC( _reader.TableOfContents, _filter ); // Build the index entries in the classlibrary control if( _reader.HasKLinks ) helpIndex1.BuildIndex( _reader.Index, IndexType.KeywordLink, _filter ); else if( _reader.HasALinks ) helpIndex1.BuildIndex( _reader.Index, IndexType.AssiciativeLinks, _filter); // Navigate the embedded browser to the default help topic NavigateBrowser( _reader.DefaultTopic ); // set the window text this.Text = _reader.FileList[0].FileInfo.HelpWindowTitle +
" - HtmlHelp - Viewer"; // Enable the "customize" menu item if the CHM has information types and/or
// categories
miCustomize.Enabled = ( _reader.HasInformationTypes || _reader.HasCategories); // Force garbage collection to free memory GC.Collect();

The variables _tocTree1, helpIndex1 and helpSearch2 are instances of the user controls provided in the class library. So with the line _reader.OpenFile( openFileDialog1.FileName, _filter ), all the internal file decoding will be done. If you are running your application in Debug mode, watch the Output window of VS.NET to see what the library is currently doing.

After this few lines of code, and some additional UI updates, the HtmlHelp.HtmlHelpSystem class has loaded the table of contents, index, full-text search and all the other internal system files.

6.5 Merging additional CHM files

Once you have opened a CHM file, the HtmlHelp.HtmlHelpSystem class offers the feature to merge additional files. This will result in one single table of contents tree and one index tree (see Viewer.cs line 1168).

// clear current items

tocTree1.ClearContents();
helpIndex1.ClearContents();
helpSearch2.ClearContents();

// merge the chm file selected in the OpenFileDialog to the existing one

// in the HtmlHelpSystem class

// if _dmpInfo == null, the dumping feature will be disabled

_reader.MergeFile( openFileDialog1.FileName, _dmpInfo  );

// Enable the toc-tree pane if the opened file has a table of contents

tocTree1.Enabled = _reader.HasTableOfContents;
// Enable the index pane if the opened file has an index

helpIndex1.Enabled = _reader.HasIndex;
// Enable the full-text search pane if the

// opened file supports full-text searching

helpSearch2.Enabled = _reader.FullTextSearch;

// ...


// Rebuild the table of contents tree view in the classlibrary control

// using the new merged table of contents

// _filter is the information-type/category filter instantiated in the 
// constuctor
tocTree1.BuildTOC( _reader.TableOfContents, _filter ); // Rebuild the index entries in the classlibrary control // using the new merged index if( _reader.HasKLinks ) helpIndex1.BuildIndex( _reader.Index, IndexType.KeywordLink, _filter ); else if( _reader.HasALinks ) helpIndex1.BuildIndex( _reader.Index, IndexType.AssiciativeLinks, _filter); // Navigate the embedded browser to the default help topic NavigateBrowser( _reader.DefaultTopic ); // Enable the "customize" menu item if the CHM has information types and/or
// categories
miCustomize.Enabled = ( _reader.HasInformationTypes || _reader.HasCategories); // Force garbage collection to free memory GC.Collect();

Notes: Don't forget to rebuild the TOC and index after a merge action, to get the new entries into the user control's UI. If you use the the _reader.OpenFile() method, the HtmlHelp.HtmlHelpSystem internal data will be deleted and get recreated using the new file. So calling _reader.OpenFile() will always reset the help system.

6.6 Accessing the table of contents programmatically

If you use the provided user controls of the library, you don't have to care about the table of contents structure since the tree view will be filled by the control itself. In some cases, e.g. implementing your own HelpProvider for the class library, you may want to access the table of contents tree programmatically.

The table of contents tree can be accessed using the HtmlHelp.HtmlHelpSystem's property TableOfContents. The property returns an instance of the class HtmlHelp.TableOfContents. This class offers some search methods and a property to access the tree itself. Use the property TOC of the HtmlHelp.TableOfContents class to receive an ArrayList of HtmlHelp.TOCItem instances. Each of these items store the topic name, the location, the URL, and an ArrayList of HtmlHelp.TOCItem instances which represents the children of the item.

The following sample code demonstrates how a TreeNodeCollection of a TreeView control can be filled up:

// non existent method in the classlibrary

// only for demonstration

private void Main()
{
    //Get the current table of contents

    TableOfContents currentToc = _reader.TableOfContents;

    // clear the tree nodes of an existing tree view

    tocTreeView.Nodes.Clear();
    // recursively build the tree nodes

    BuildTOC(currentToc.TOC, tocTreeView.Nodes);
    // update the control

    tocTreeView.Update();
}

/// <summary>

/// Recursively builds the toc tree and fills the treeview

/// </summary>

/// <param name="tocItems">list of toc-items</param>

/// <param name="col">treenode collection of the current level</param>

private void BuildTOC(ArrayList tocItems, TreeNodeCollection col)
{
    foreach( TOCItem curItem in tocItems )
    {
        TreeNode newNode = new TreeNode( curItem.Name,
                  curItem.ImageIndex, curItem.ImageIndex );
        newNode.Tag = curItem;

        if(curItem.Children.Count > 0)
        {
            BuildTOC(curItem.Children, newNode.Nodes);
        }
        col.Add(newNode);
    }
}

6.7 Accessing the index programmatically

The index can be accessed using the HtmlHelp.HtmlHelpSystem's property Index. The property returns an instance of the class HtmlHelp.Index. This index class stores two ArrayLists with entries of type HtmlHelp.IndexItem. One is representing the KLinks and one the ALinks. The index also builds a tree but different than the table of contents, because the keyword text contains always the parent keyword ( ", " separated list). Each HtmlHelp.IndexItem contains an ArrayList with the associated help topics. This ArrayList can be accessed using the property Topics. Each item is of type HtmlHelp.IndexTopic. If an HtmlHelp.IndexItem contains more than one HtmlHelp.IndexTopic entry, the user has to choose the topic to display (see 2nd screenshot).

The following sample code demonstrates how to fill a ListBox control with the KeywordLinks index:

// non existent method in the classlibrary

// only for demonstration

private void Main()
{
    //Get the current table of contents

    Index currentIndex = _reader.Index;

    // fill the listbox with the index items

    BuildIndex( currentIndex, IndexType.KeywordLinks);
}

/// <summary>

/// Call this method to build the help-index and fill the internal list box

/// </summary>

/// <param name="index">Index instance extracted from the chm file(s)</param>

/// <param name="typeOfIndex">type of index to display</param>

public void BuildIndex(Index index, IndexType typeOfIndex)
{
    ArrayList _arrIndex = null;

    // get the ArrayList of tf the requested index type

    switch(typeOfIndex)
    {
        case IndexType.AssiciativeLinks: _arrIndex = index.ALinks; break;
        case IndexType.KeywordLink: _arrIndex = index.KLinks; break;
    }

    // clear the current items in the list box

    lbIndex.Items.Clear();

    // sort the index

    _arrIndex.Sort();

    foreach(IndexItem curItem in _arrIndex)
    {
        // Add the index entry to the listbox

        lbIndex.Items.Add( GetIndent(curItem.Indent) + curItem.KeyWord );
    }
}

6.8 Accessing file contents programmatically

With version 0.4 of the library, I've introduced two new properties called ContentFile and FileContents in the following classes:
HtmlHelp.TOCItem, HtmlHelp.IndexTopic, HtmlHelp.CHMDecoding.TopicEntry

The property FileContents directly returns the file content as string. It automatically applies the correct file encoding.

// curEntry is any instance of TopicEntry, TOCItem or 

IndexTopic !! TopicEntry curEntry = <any instance of TopicEntry>; 

// returns the contents of the file as string

string sContent = curEntry.FileContents; 

Using the property FileContents doesn't allow you to implement error tracking. If the content file can't be opened or read, you simply get an empty string without any exceptions which is the same as reading an empty file !

The property ContentFile returns an instance of HtmlHelp.Storage.FileObject or null if not accessible.
You can use this property to access the native file contents of the associated content files.

Note: If you read native contents from a text file like htm, hhc, hhk etc. you have to make sure to use the appropriate text encoding !
Always make sure you're closing the returned FileObject instance if not needed any longer.

The following code snippet will show you, how to read the text-content of a topic file.

// curEntry is any instance of TopicEntry, TOCItem or IndexTopic !!

TopicEntry curEntry = <any instance of TopicEntry>;

// Get the FileObject instance

FileObject fo = curEntry.ContentFile;

// Check if you've got an instance

if(fo != null)
{
    // Check if you can read from the file

    if(fo.CanRead)
    {
        // read the file contents

        byte[] fileData = new byte [fo.Length];
        fo.Read(fileData, 0, (int)fo.Length);
        // CLOSE !! the file (important!)

        fo.Close();

        // Get the content as string using the correct text-encoding

        string sContent = curEntry.TextEncoding.GetString(fileData);
    }
    else
    {
        // if not, CLOSE! the file object

        fo.Close(); 
        MessageBox.Show("File " + curEntry.Locale + " not readable!");    
    }
}
else
{
    // if not, the content of this file can not be accessed

    MessageBox.Show("Couldn't get file object for " + curEntry.Locale); 
}

This snippet doesn't implement a file-type checking. It assumes that the content file is a text-file !

6.9 Use the class library's table of contents user control

As I've already told you, the class library has some built-in user controls for displaying the 3 main help panes: Table of contents, Index and Search. Using this controls allows you to integrate the HtmlHelp system with a few Drag and Drops and some lines of code in minutes.

The user control for the table of contents is implemented in the class HtmlHelp.UIComponents.TocTree. You can use the VS.NET Forms designer to place the control on your application's form and adjust its properties.

The most important method is BuildTOC( HtmlHelp.TableOfContents tocInstance ) (see 6.4). If you call this method, the internal TreeView control will be filled with the table of contents items (see 6.6).

For interacting with your UI, the control implements an event called TocSelected. This will be raised whenever the user selects a new topic in the table of contents, notifying the main UI that a new topic should be displayed.

//

// tocTree1

//

this.tocTree1.Dock = System.Windows.Forms.DockStyle.Fill;
this.tocTree1.DockPadding.All = 2;
this.tocTree1.Location = new System.Drawing.Point(0, 0);
this.tocTree1.Name = "tocTree1";
this.tocTree1.Size = new System.Drawing.Size(292, 484);
this.tocTree1.TabIndex = 0;
// subscribe to the event

this.tocTree1.TocSelected +=
  new TocSelectedEventHandler(this.tocTree1_TocSelected);

// ...


/// <summary>

/// Called if the user selects a new table of contents item

/// </summary>

/// <param name="sender">sender of the event</param>

/// <param name="e">event parameters</param>

private void tocTree1_TocSelected(object sender, TocEventArgs e)
{
    // if the selected item contains an url

    if( e.Item.Local.Length > 0)
    {
        // navigate to the url

        NavigateBrowser(e.Item.Url);
    }
}

6.10 Use the class library's index user control

The user control for displaying the index pane is located in the class HtmlHelp.UIComponents.helpIndex. You can use the VS.NET Forms designer to place the control and adjust its properties.

The most important method is BuildIndex( HtmlHelp.Index indexInstance, IndexType typeOfIndey ) (see 6.4). If you call this method, the internal ListBox will be filled up with the index tree (see 6.7).

For interacting with your UI, this control implements two events. The first one is named IndexSelected. This event is raised if the user selects a topic related to an index entry, notifying the main UI that a new topic should be displayed.

The second event is named TopicsFound. This event is raised if the user selects an index entry with multiple related topics. If you do not handle this event, the class library will pop up a built-in dialog (see second screenshot). Handle the event if you want to create a different UI than the default one.

//

// helpIndex1

//

this.helpIndex1.Dock = System.Windows.Forms.DockStyle.Fill;
this.helpIndex1.Location = new System.Drawing.Point(0, 0);
this.helpIndex1.Name = "helpIndex1";
this.helpIndex1.Size = new System.Drawing.Size(292, 470);
this.helpIndex1.TabIndex = 0;
this.helpIndex1.IndexSelected +=
  new IndexSelectedEventHandler(this.helpIndex1_IndexSelected);
this.helpIndex1.TopicsFound +=
  new TopicsFoundEventHandler(this.helpIndex1_TopicsFound);

// ...


/// <summary>

/// Called if the user selects an index topic

/// </summary>

/// <param name="sender">sender of the event</param>

/// <param name="e">event parameters</param>

private void helpIndex1_IndexSelected(object sender, IndexEventArgs e)
{
    if(e.URL.Length > 0)
        NavigateBrowser(e.URL);
}

/// <summary>

/// Called if the user selects an index with more than one related topics.

/// If you do not handle this event,

/// the HtmlHelp library will show a standard dialog.

/// </summary>

/// <param name="sender">sender of the event</param>

/// <param name="e">event parameters</param>

private void helpIndex1_TopicsFound(object sender, TopicsFoundEventArgs e)
{
    // display a UI to the user and let him select one of the found topics

    // you can get the list of topics found using

    //

    // e.Topics which is an ArrayList of HtmlHelp.IndexTopic instances

}

6.11 Use the class library's full-text search user control

The user control for displaying the full-text search pane can be found in the class HtmlHelp.UIComponents.helpSearch. As with the two other controls, you can use VS.NET Forms designer to place the control and adjust its properties.

The difference to the other controls is, that you do not have to initialize the content of this control since the user has to enter a search string before searching can be done.

For this behavior, the control offers two events. At first, the FTSearch event will be raised notifying the main UI that the user wants to perform a full-text search. You can access the search parameters using the event arguments. Now, we have to initiate the full-text search by calling the PerformSearch() method of HtmlHelp.HtmlHelpSystem instance (variable _reader in examples above). This method returns a DataTable instance with the found entries stored in DataRows.

The DataTable results contain the following fields:

  • Rating - a calculated rating of the topic.
  • Title - the title of the topic
  • Locale - the locale string of the topic (virtual link of the content file in the CHM store)
  • Location - the location of the topic (useful if searching in a merged environment)
  • URL - the URL which can be used by the web browser control

Once you have received the results, you have to "send" them back to the user control using the method SetResults().

If the user selects a topic from the search results, the event HitSelected will be raised, notifying the main UI that a new topic should be displayed.

//

// helpSearch2

//

this.helpSearch2.Dock = System.Windows.Forms.DockStyle.Fill;
this.helpSearch2.Location = new System.Drawing.Point(0, 0);
this.helpSearch2.Name = "helpSearch2";
this.helpSearch2.Size = new System.Drawing.Size(292, 470);
this.helpSearch2.TabIndex = 0;
this.helpSearch2.HitSelected +=
  new HitSelectedEventHandler(this.helpSearch2_HitSelected);
this.helpSearch2.FTSearch +=
  new FTSearchEventHandler(this.helpSearch2_FTSearch);

// ...


/// <summary>

/// Called if the user hits the "Search" button on the full-text search pane

/// </summary>

/// <param name="sender">sender of the event</param>

/// <param name="e">event parameters</param>

private void helpSearch2_FTSearch(object sender, SearchEventArgs e)
{
    // display a wait cursor

    this.Cursor = Cursors.WaitCursor;
    try
    {
        // initiate the full-text search ( 500 = maximum hits )

        DataTable dtResults = _reader.PerformSearch( e.Words,
                     500, e.PartialWords, e.TitlesOnly);
        // "send" the results back to the full-text search pane

        // and display them in the listview

        helpSearch2.SetResults(dtResults);
    }
    finally
    {
        // display the arrow cursor

        this.Cursor = Cursors.Arrow;
    }
}

/// <summary>

/// Called if the user selects an entry from the search results.

/// </summary>

/// <param name="sender">sender of the event</param>

/// <param name="e">event parameters</param>

private void helpSearch2_HitSelected(object sender, HitEventArgs e)
{
    // if the selected topic has an URL

    if( e.URL.Length > 0)
    {
        // Navigate the browser to this URL

        NavigateBrowser(e.URL);
    }
}

6.12 The HelpProviderEx component

New in version 0.3 is an extendet HelpProvider component.
If you create an application with an integrated help system (viewer, table of conents, index, etc.) you can use the HelpProviderEx class for providing standard help functionalities to your dialogs or views.

The HelpProviderEx component should work like the standard .NET component HelpProvider if you are not working with the HelpHelpSystem class and have not initialized the HelpProviderEx with a viewer (should because I haven't fully tested the component till now).

Since you will use HelpProviderEx to get rid of the Microsoft CHM-Viewer UI, you have to initialize the HelpProviderEx class with a viewer application.
This viewer has to implement the interface HtmlHelp.UIComponents.IHelpViewer:

/// <summary>

/// The interface <c>IHelpViewer</c> defines methods/properties for a 
/// help-viewing window.
/// </summary> public interface IHelpViewer { /// <summary> /// Navigates the helpviewer to a specific help url /// </summary> /// <param name="url">url</param> void NavigateTo(string url); /// <summary> /// Shows help for a specific url /// </summary> /// <param name="namespaceFilter">namespace filter (used for merged
/// files)</param>
/// <param name="hlpNavigator">navigator value</param> /// <param name="keyword">keyword</param> void ShowHelp(string namespaceFilter, HelpNavigator hlpNavigator,
string keyword); /// <summary> /// Shows help for a specific keyword /// </summary> /// <param name="namespaceFilter">namespace filter (used for merged
/// files)</param>
/// <param name="hlpNavigator">navigator value</param> /// <param name="keyword">keyword</param> /// <param name="url">url</param> void ShowHelp(string namespaceFilter, HelpNavigator hlpNavigator,
string keyword, string url); /// <summary> /// Shows the help index /// </summary> /// <param name="url">url</param> void ShowHelpIndex(string url); /// <summary> /// Shows a help popup window /// </summary> /// <param name="parent">the parent control for the popup window</param> /// <param name="text">help text</param> /// <param name="location">display location</param> void ShowPopup(Control parent, string text, Point location); }

The provided sample viewer application is implementing this interface so you can have a look there on how it can be done.

If you want to use the HelpProviderEx component linke the standard HelpProvider, you have to

  • Set the HelpNamespace property of the HelpProviderEx to a valid value
  • Set the Viewer property to null

If you want to use the HelpProviderEx component in an integrated help environment, you have to

  • Set the Viewer property to an object instance implementing the IHelpViewer instance
    (The HelpNamespace property will then be ignored)

Since HelpProviderEx implements the IExtenderProvider interface it extends your controls with additional properties:

  • HelpNamespace (only HelpProviderEx)
  • HelpKeyword (standard HelpProvider)
  • HelpNavigator (standard HelpProvider)
  • HelpString (standard HelpProvider)
  • ShowHelp (standard HelpProvider)

You can use the HelpNamespace property in merged CHM environments to reduce the amount of data to search (specify the name of the CHM where the searched keyword/topic/etc. is located).

There are a few tutorials online on how to use the HelpProvider component so I won't go deeper here.

7. Useful links

8. Conclusion

The class library holds all the data extracted in the memory. This may speed up some tasks like searching, but needs memory (e.g. ~10MB for DirectX 9.0 SDK help file). Since some of the internal file formats are not completely clear (garbage spaces, unknown values, ...), a binary merge of CHM files will be difficult. That's the reason why I'm loading the TOC and index into memory and merging them using internal classes.

I haven't got enough time to fully test the new introduced HelpProviderEx component, so bug reports are welcome.

Special thanks to Nick Butler who gave alot of bug and feature inputs for version 0.4 and helped me improving the library ;)

9. History

Version 0.4 2004 Aug. 06
  • Fixed an issue with reading hhc/hhk files containing brackets ().
  • Fixed an issue with identifying the master hhc in chm's containing multiple hhc files. In some circumstances this requires a storage enumeration which is slow on chm's with alot of content files.
  • Fixed an issue with reading see-also items from text-based hhk-index files
  • Fixed an issue with "See Also" index links. The shipped index user control now handles these special index items (IndexEventArgs now holds additional see also information)
  • Fixed an issue with decoding urlstr file.
  • Fixed an issue with parsing text-based index files. Items now have the correct indent and deeper levels are supported !
  • Changed accessibility of some internal CHM-decoding classes. This will allow users of the lib to access more nativ CHM information/data structures.
  • Added the properties ContentFile and FileContents to the classes TOCItem, IndexTopic and TopicEntry. The ContentFile property opens the associated content file and returns an FileObject instance if succeeded. You can use this instance to programmatically access the nativ file contents ! The FileContents property directly reads the contents of the associated file and returns them as a string. This property automatically applies the correct CHM encoding to support multiple languages !
Version 0.3 2004 May 17
  • Fixed the number of hits displaying in the search pane. If working with merged files, the maximum hits were applied per file not for the whole system.
  • Fixed an issue during data decoding of binary table of contents. The system reads binary TOCs now much faster and doesn't have problems with building the tree in deeper TOC levels.
  • Optimized index merging. Index merging is no longer member of complexity class n�. It's now log(n) (using binary search and insertion algorythm)
  • Optimized memory usage. Especially the memmory usage of table of contents and index items. The title and locale strings are no longer stored as strings in every item (only in CHMs with binary toc and/or index). The item just holds offsets into the loaded system file data. After opening/merging a file to the system and updating the UI you should call GC.Collect() to force a garbage collection (this will free a view MB depending on the amount of CHM data).
  • Added imagelist of standard CHM-Viewer
  • Added the method ClearContents() to the three internal user controls (toc-tree, index and search) which allows the user to reset the control's contents if opening new files.
  • Added support for compressed dumping of data (speeds up CHM reloading).
  • Added support for CHM-Merged file list (see #IDXHDR file). The CHM with the master TOC MUST be opened first ! The MS standard CHM-viewer can also create the TOC correctly if you open one of the slave CHMs. This is not supported by the library (TOC tree will contain all topics, but not in a correct tree) !
  • Added support for merged TOC-Items in hhc files ("Merge" parameter)
  • Added support for information types and categories.
  • Added an extended HelpProvider component for interacting with the HtmlHelpSystem
  • Added native HelpToolTipWindow for adding Popup-Help support for applications using HelpProviderEx class.
Version 0.2 2004 April 25
  • Fixed an issue which prevents the classlibrary from loading the correct table of contents file (text based hhc file). This happens in CHMs where the Contents file option of the HHP-File (htmlhelp project) is not set and the HHC-File name is not the default one (<chmfile>.hhc or table of contents.hhc). Same for text-based index files. (found in MSDN-Magazin CHMs)
  • Added internaltional support. The library detetcs the LCID (language code id) and the used codepage and adjusts the encoding used for converting binary arrays to strings.
  • Fixed the fulltext-search algorythm. Still not 100% working for international languages (didn't find the error till now)
  • Added a classlibrary CHM to the source zip.
  • Added a ChmFileInfo class for easily getting system information of CHM-Files (see loaded files in about dialog of the example viewer).
  • Fixed an URL issue which prevents the IE (newest IE patch) to display linked content files correctly.
Version 0.1 2004 April 20 Article release

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here