Introduction
Have you ever been busy finding a good compression library for one of your projects? Have you ever considered that starting at Windows 98 and Windows NT 4.0, Microsoft provides a very good compression library for free? In fact, it's even usable in earlier versions if you don't mind redistributing a DLL. I am talking about the Microsoft Cabinet SDK which is freely downloadable.
Microsoft Cabinet SDK
At the moment I am typing this article, the SDK can be downloaded here. The SDK consists of a couple of things:
- Several binaries that work with cabinet files.
- Source code and header files to work with the library.
- Samples on how to use the source code and headers.
- Documentation on:
- How to use the binaries
- The file formats
- How to use the API
- The algorithms
This article is just about the API. Currently, I have only created classes for extracting cabinet files, but I might extend the classes in future to support existing and new cabinet files as well.
Background
First of all, let me tell you why I decided to create these classes. The API provided by Microsoft is very powerful, but it is written in C and quite hard to use. It needs a lot of repetitive tasks which are quite common when working with cabinet files. That led me to the idea of creating wrappers that implement these common tasks so I don't have to write the same code all over again. There already are a couple of articles around that provide excellent classes to work with cabinet files, but unfortunately, they don't meet my needs. I want a wrapper with the following properties:
- It has to be a lightweight and fast wrapper
- It has to take all repetitive tasks out of your hands
- It has to be easy to use
- It has to be easily extendable so it can handle extractions through other means than files (for instance, through resources or pipes)
It is the last property that none of the implementations I found so far provide. After all, it is very bothersome and inefficient to have to store a cabinet on disk so you can extract it and delete the file again while you could have done the same in memory. With these templates, it will be relatively easy to extract a cabinet through any electronic medium you desire.
The base template: CCabinetT
I have chosen to use templates for their efficiency, because I don't need virtual functions this way (which is not even possible on static functions, while it is with templates, as you can see in the source code). Another advantage of 'virtual' functions through templates instead of normal virtual functions is compile time linking instead of runtime linking, meaning the compiler can make optimizations. However, this is no article about templates, but about cabinet files, so I'll try to stay focused.
The base template, CCabinetT
, does most of the work, and wraps the Cabinet API. Using it is fairly easy. It cannot be instantiated directly, so you will have to write a subclass on it. There is an empty subclass CCabinet
, but I do not recommend to use that one, because the only thing it can do is to extract every file in the cabinet (the default behavior from CCabinetT
). For this article, I will define a new class named CCabinetTutorial
. For now, I will declare it like this:
class CCabinetTutorial: public CCabinetT<CCabinetTutorial> { };
Is that all, you wonder? Yes, that is all. This is exactly how CCabinet
is defined. CCabinetT
provides all functionality to extract files from a cabinet file. So, let's take a look on how it should be used. The following fragment shows how to extract all files from tutorial.cab file to the directory Extract:
CCabinetTutorial cabTutorial;
if (!cabTutorial.CreateFDIContext())
return;
if (!cabTutorial.IsCabinet("tutorial.cab"))
return;
cabTutorial.Copy("tutorial.cab", "Extract");
cabTutorial.DestroyFDIContext();
Those of you who have worked before with the Cabinet API will immediately recognize the four functions that get shown here. Those who don't will probably wonder what on earth an FDI context is. Let me try to explain this to you. The Cabinet API consists of two parts; the FCI and FDI. The FCI (File Compression Interface) is responsible for creating and compressing cabinet files, and the FDI (File Decompression Interface) is responsible for decompressing. The FCI and FDI each need their own context to function. As these templates are only capable of decompressing cabinet files, we will only use the FDI. If you would like to know more about the workings of the FCI and FDI, I refer you to the documentation of the Cabinet SDK.
Apart from the FDI context, this small example should be pretty self-explanatory. It creates the context, checks if the target file is a valid cabinet file, and extracts to the specified folder. After that, it cleans up the resources we used. But wait a minute! What if we don't want to extract all files from a cabinet? Simple, we only have to override some functions. CCabinetT
allows the subclasses to overwrite the following functions (as if they are virtual):
OnCabinetInfo
OnCopyFile
OnCopyFileComplete
OnNextCabinet
Alloc
Free
Open
Read
Write
Close
Seek
I will start with explaining the functions marked bold. The OnCabinetInfo
function gets called when a new cabinet file gets opened. The parameters contain some basic information about the contents of the cabinet. OnCopyFile
gets called when a file is about to be copied, and you can allow or disallow the file to be copied with the return value. OnCopyFileComplete
gets called when a file is extracted, and OnNextCabinet
is called when the contents of the cabinet span over multiple cabinet files. You will have to make sure the next cabinet file is accessible before returning, for instance, by providing the user the chance to insert another disk. So, let's redo CCabinetToturial
to implement these functions.
class CCabinetTutorial : public CCabinetT<CCabinetTutorial>
{
private:
void OnCabinetInfo(CABINETINFO& ci, LPVOID)
{
printf("Cabinet Info\n"
" next cabinet: %s\n"
" next disk: %s\n"
" cabinet path: %s\n"
" cabinet set ID: %d\n"
" cabinet # in set: %d\n\n",
ci.szNextCabinet,
ci.szNextDisk,
ci.szPath,
ci.uSetID,
ci.uCabinet
);
}
bool OnCopyFile(CABINETFILEINFO& cfi, LPCSTR szPath, LPVOID)
{
printf("Extracting '%s' to '%s'...", cfi.szFile, szPath);
return true;
}
void OnCopyFileComplete(LPCSTR, LPVOID)
{
printf("...DONE\n\n");
}
void OnNextCabinet(CABINETINFO& ci, FDIERROR, LPVOID)
{
printf("\n\nPlease insert the disk containing '%s' before "
"pressing a button to continue\n", ci.szNextCabinet);
getc(stdin);
}
friend class CCabinetT<CCABINETTUTORIAL>;
};
It's as simple as that! For more information on these overridables, I refer you to the documentation of the Cabinet SDK. I will now discuss the next two overridables:
OnCabinetInfo
OnCopyFile
OnCopyFileComplete
OnNextCabinet
Alloc
Free
Open
Read
Write
Close
Seek
As you might have already guessed, these two functions provide memory management. The default implementation just calls the standard C++ new
and delete
operators. You can override them like this:
class CCabinetTutorial : public CCabinetT<CCabinetTutorial>
{
private:
static void * Alloc(size_t size)
{
return malloc(size);
}
static void Free(void * memblock)
{
free(memblock);
}
friend class CCabinetT<CCABINETTUTORIAL>;
};
Notice that these two functions are static. Aren't templates beauties to allow 'virtual' static functions :) ? Anyway, these functions are pretty self-explanatory and boring, so let's move on to the real deal!
The extension class: CCabinetExT
You are probably wondering if I forgot telling you about the other overridables. Well, I didn't, because this template is all about them. The default implementations of those overridables simply wrap the CRT I/O functions which are likewise named. You can read up on them in the MSDN. You can override the CCabinetT
class if you just wish to make use of another I/O API than the standard CRT. However, to make most use out of overriding these functions, you'll want to use CCabinetExT
. This class provides everything you need to change from a file based medium to a different medium. You want to access cabinets in pipes? No problem! Just derive a class from this template and fill in the missing pieces of the puzzle.
How does it work? By overriding the remaining overridables. The problem is that the Cabinet library expects these functions to handle files, so I had to find a way to use these functions on both files and user defined methods. I achieved this in the following way. The first parameter of Open
is a string. Usually, this is a string containing a filename. However, by starting this name with a specific marker to the extension, I can separate file requests from user defined requests. Say, I make a marker '//Resources\\'. Now, every time I do a Copy
request starting with the marker '//Resources\\', it will call the user defined functions instead of the file I/O functions. To achieve this, the template provides the following overridables:
GetIdentifier
OpenEx
ReadEx
WriteEx
CloseEx
SeekEx
The function GetIdentifier
merely returns a pointer to a string containing the marker. This has to be overridden for the extension to work correctly. The '~Ex' functions should be overridden to provide an implementation for different cabinet handling. The functions should act the same as the CRT I/O functions, so I advise you to take a good look at the MSDN and the CResourceCabinetT
template.
Other functions
There are a couple of functions I have not yet discussed, but they are important nonetheless. This section is dedicated to those functions. For CCabinetT
, these functions are:
AbortOperation
GetLastError
The function AbortOperation
can be called from any of the notification functions (OnCabinetInfo
, OnCopyFile
, OnCopyFileComplete
, and OnNextCabinet
). It will immediately abort the Copy
operation, all files not extracted will not be extracted anymore and the already extracted files will remain extracted. The GetLastError
function can be used to check the internal error structure. Look at the Cabinet SDK documentation for more information on that. It is possible that a function fails, yet GetLastError
will not return an error. In that case, try using the Win32 GetLastError
function. If that doesn't give a valid error, either chances are big that you have incorrectly used the class or an out of memory situation occurred. For the CCabinetExT
template, there is one function to describe:
This function guarantees you will get a string which can be passed to the Copy
function so it will detect your extension. The function is straightforward in its use and you can look at the comments on how to use it if you don't get it.
Last but not least: CResourceCabinetT
This template is based on the CCabinetExT
template. It provides the means to extract cabinet files directly from the resource section of a module. It's very simple to use it:
class CTutorialCabinet : public CResourceCabinetT<CTutorialCabinet> { };
That should look familiar! The usage is exactly identical to the CTutorialCabinet
class I described above. In fact, the mentioned implementation for the CTutorialCabinet
would compile as well for this class! However, to make it a little easier to program and to take away the need to build the Copy
strings, there are three helper functions that do this for you:
BuildResourceName
CopyFromResource
IsResourceCabinet
The BuildResourceName
function builds you a compatible Copy
string from a resource identifier and a resource type. The CopyFromResource
and IsResourceCabinet
functions take this job event away from you by automating building the compatible Copy
string. The demo project provides a working sample and executable which make use of the CResourceCabinetT
template.
Final notes
- To compile the demo project, you will need to download the Cabinet SDK from the provided link at the top of this article. The demo project will not be compilable unless you link the project with the Cabinet SDK or include the source files. I have not included them intentionally, because I do not know if I'm allowed to redistribute them.
- This implementation makes use of the
stdext::<CODE>hash_set
class. If you don't have Visual Studio .NET 2003, you can fairly easily change the implementation of CCabinetExT
to make use of std::set
instead.
- These classes are not thread safe. However, each thread can have its own context. As every class uses a context of its own, it is perfectly safe to use one instance of the class for each separate thread. Just make sure you don't cross use the instances.
- These templates are freely usable by everyone who wishes to use them, even in commercial applications. Please note though that there still might be some bugs. If you come across any bugs, please let me know and I'll try to fix it as fast as possible.
Revision history
- 25-07-2004: Initial release.