Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Decompiling CHM (help) files with C#

0.00/5 (No votes)
11 Nov 2003 1  
Introduction to IStorage interface and MS Help file format including sample C# decompilation DLL for CHM files.

Introduction

This articles demonstrates the use of IStorage interface in managed C# code based on a simple CHM (MSHelp 1.0) decompiler.

Decompiling CHM (help) files with C#

Recently I came across a very interesting problem related to IStorage interface. I had to be able to manipulate IStorage container from managed code. Since my knowledge of COM is limited, I thought I would simply find a snippet of code using Google. I did hope that the provided example would allow me to Read/Write files from compound storage structures. I discovered a couple of incomplete snippets posted in newsgroups but that was not enough. Not enough for me to be lazy and enjoy my favorite programming technique of CTRL+C, CTRL+V. I had to do some actual work on my own. I needed a wrapper what would allow me to easily access the internal structure of any compound storage object. By the way, for the latest version of this wrapper don't forget to visit here.

According to Microsoft, IStorage interface supports the creation and management of structured storage objects. Structured storage allows hierarchical storage of information within a single file, and is often referred to as "a file system within a file". Yes, it does sound interesting but a bit complicated. How about a real life example?

Well, the most simple and powerful example of compound storage object would be good old CHM files. The compound file implementation of IStorage allows you to create and manage sub-storages and streams within a storage object residing in a compound file object. You can pack your entire collection of help documents, HTML files, images etc. into a single IStorage object to save space and to provide your users with a standard file that can be viewed with your trusted help-viewer. Typically you would use tools provided by Microsoft such as HTML Help Workshop to manipulate a collection of help related information. You can also take a look at Microsoft HTML Help 1.4 SDK to get a complete picture of what is a CHM help file anyway and why you need it.

How about reversed process? HTML Help Workshop supports decompiling as well, but it is a standalone application after all and you can�t use it if you want to automate certain processes. Let's say you want to be able to access content from a CHM file in the managed code without the �hassle� of Microsoft UI. Naturally we need some kind of a wrapper that will simplify READ/WRITE operations for us.

Let�s try to access the content of IStorage structure using standard COM interfaces provided to us by Microsoft. Microsoft did not provide us with managed classes to manipulate IStorage objects directly, so we�ll have to rely on System.Runtime.InteropServices to import interfaces that we�ll need in our managed code.

Just to give you an idea of what elements are present in our IStorage solution, take a look at the snapshot of Visual Studio Project, above.

We�ll organize our collection of classes into a single RelatedObjects.Storage namespace. We�ll create a IBaseStorageWrapper class that will help us to enumerate all objects packed into an IStorage file.

Ideally we would like to be able to access any stream in a file separately, so we�ll create IStorageWrapper and ITStorageWrapper classes that will inherit from the IBaseStorageWrapper. These classes will help us to handle different types of objects stored in the main compound storage object.

The only difference between ITStorageWrapper and IStorageWrapper is the way they access internally stored objects.

IStorageWrapper is using the StgOpenStorage interface available in Ole32.dll.

public class Ole32
{
    [DllImport("Ole32.dll")]
    public static extern int StgOpenStorage (
        [MarshalAs(UnmanagedType.LPWStr)] string wcsName,
        IStorage pstgPriority,
        int grfMode,               // access method

        IntPtr snbExclude,        // must be NULL

        int    reserved,         // reserved

        out IStorage storage    // returned storage

        );
}

ITStorageWrapper is using the ITStorage.StgOpenStorage available via the ITStorage COM interface.

/// <summary>

/// .NET interface wrapper for InfoTech interface for ITStorage COM object

/// </summary>

[ComImport,
Guid("88CC31DE-27AB-11D0-9DF9-00A0C922E6EC"),
InterfaceType(ComInterfaceType.InterfaceIsIUnknown),
SuppressUnmanagedCodeSecurity]
public interface ITStorage 
{    
    [return:MarshalAs(UnmanagedType.Interface)]
    IStorage StgCreateDocfile([In, 
            MarshalAs(UnmanagedType.BStr)] string pwcsName, 
            int    grfMode, 
            int reserved);

    [return:MarshalAs(UnmanagedType.Interface)]
    IStorage StgCreateDocfileOnILockBytes(ILockBytes plkbyt, 
                                int grfMode, int reserved);

    int StgIsStorageFile([In, 
         MarshalAs(UnmanagedType.BStr)] string pwcsName);
    int StgIsStorageILockBytes(ILockBytes plkbyt);

    [return:MarshalAs(UnmanagedType.Interface)]
    IStorage StgOpenStorage([In, 
                MarshalAs(UnmanagedType.BStr)] string pwcsName, 
                IntPtr pstgPriority,
                [In, MarshalAs(UnmanagedType.U4)] int grfMode, 
                IntPtr snbExclude, 
                [In, MarshalAs(UnmanagedType.U4)] int reserved);

    [return:MarshalAs(UnmanagedType.Interface)]
    IStorage StgOpenStorageOnILockBytes(ILockBytes plkbyt, 
                IStorage pStgPriority, 
                int grfMode, 
                IntPtr snbExclude, 
                int reserved);

    int StgSetTimes([In, MarshalAs(UnmanagedType.BStr)] string lpszName, 
                    FILETIME pctime, 
                    FILETIME patime, 
                    FILETIME pmtime);
    int    SetControlData(ITS_Control_Data pControlData);
    int    DefaultControlData(ITS_Control_Data ppControlData);
    int    Compact([In, MarshalAs(UnmanagedType.BStr)] string pwcsName, 
                        ECompactionLev iLev);
}

Let's look at the flow of our cute little wrapper by looking at the TEST application included in this project. Our goal in this example is to access content elements stored inside a CHM file. As I mentioned above, a CHM file is a compound storage archive that contains a collection of separate files. We will be using our RelatedObjects.Storage dll (compiled separately) to demonstrate how easy it is to access stream objects stored in IStorage archive from managed code.

[STAThread]
static void Main(string[] args)
{
// Create Instance of ITStorageWrapper.

// During initialization constructor will process CHM file 

// and create collection of file objects stored inside CHM file.

ITStorageWrapper iw = new ITStorageWrapper(@"I:\apps\abslog.chm");

// Loop through collection of objects stored inside IStorage

foreach(IBaseStorageWrapper.FileObjects.FileObject 
                                          fileObject in iw.foCollection)
{
    // Check to make sure we can READ stream of an individual file object

    if (fileObject.CanRead)
    {
        // We only want to extract HTM files 

        //in this example fileObject is our 

        // representation of internal file stored in IStorage

        if (fileObject.FileName.EndsWith(".htm"))
        {
            Console.WriteLine("Path: " + fileObject.FilePath);
            Console.WriteLine("File: " + fileObject.FileName);

            // FileUrl - is an external reference 

            //to the internal object. It allows you to display content 

            //of a single file in Internet Explorer

            // without extracting content from the archive

            Console.WriteLine("Url: " + fileObject.FileUrl);

            string fileString = fileObject.ReadFromFile();
            Console.WriteLine("Text: " + fileString);

            // Direct Extraction sample

            fileObject.Save(@"i:\apps\test1\" + fileObject.FileName);

            // Read first and then save later example

            StreamWriter sw = File.CreateText(@"i:\apps\" + 
                                     fileObject.FileName);
            sw.WriteLine(fileString);
            sw.Close();

            Console.ReadLine();
        }
    }
}
Console.ReadLine();
}

Conclusion

As you can see, this example demonstrates several useful internal file manipulation methods. I'm sure that you could use other methods as well. Our company will continue enhancing some of the functionality to make it even easier to manage IStorage objects. You can visit our site to get latest and greatest version and download updated documentation here.

If you have any questions or need latest source code e-mail me at: support@asprelated.com or click here.

Have a wonderful day!

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here