Introduction
I was reading Jignesh Patel's article
here on CP and I found someone asking for a text version of the ClassView
information. Actually, I needed that too for a project where I wanted to rename
some classes and source files, so I took the challenge and tried to decode
ClassView binary data.
This is for you, Uwe! ;-)
Binary formats and Hex editors
Well, since the ".opt" file is a Compound Document (or Structured
Storage) file I opened it with the DocFile Viewer utility that comes
with Visual Studio and I used the internal Hex viewer to try to decode the
"ClassView Window" stream.
The first thing I noted was the length of the stream reported on the window's
title bar with respect to what seemed the actual data: some strings of text and
a few other bytes of header. The rest of the stream is just garbage, so it's
sufficient to save the stream length and fill it up with zeroes while
loading.
The next thing to do when you find strings in a block of binary data is to
see if they're zero terminated, like C strings, or prefixed with their length,
like Pascal or Basic strings, or maybe fixed length. I found all strings were
prefixed with a BYTE
or a WORD
, matching the string's
length.
The other fields all seemed like WORD
or DWORD
values or flags. Some could be identified as the project's count in the
workspace, or the folder's count in a project, or the number of subfolders and
classes in a folder.
A strange thing was the presence of two unknown class names before the first
project in the workspace and before the first folder of the first project:
CClsFldSlob and CClassSubfolderSlob. I suspected this classes were
owned by the Visual Studio IDE, so I used Nick Hodapp's OpenVC add-in and
looked for them:
Once found, I started thinking if I could use this information to my
purposes, but my research about those two classes ended there (see the Addendum). Maybe their presence among the other data is due
to the process used to serialize data into and out of the stream, maybe it's
just the MFC's serialization support, but I can't tell it because I never used
that.
No problem, this extraneous data can be easily identified by a
WORD
prefix of 0xFFFF
. Then comes another
WORD
of value 0x0001
or 0x0002
(that I
called "slob level", but it actually is the object schema of MFC), the
class name and then the first project or folder of the whole workspace. The
other projects have a WORD
prefix of 0x8001
, while
folders have a WORD
prefix of 0x8003
(both are special
index values, see the Addendum).
All these could be flags or integer numbers. I chose not to explicitly write
to XML the first WORD
prefix, which is implicit in the type of item
considered (project, folder or "slob" - dropped in last version, see the Addendum).
For each project there's a DWORD
count of the project's folders,
while for each folders there are two DWORD
s counting the number of
subfolders and contained classes. The tree is serialized with an anticipated
visit, that is subfolders are nested just after the subfolders count.
All the details can be found in the CCLVParser
class, but don't
expect a documentation for the ClassView data format, just commented source
code.
Addendum: MFC Serialization
I was trying to investigate the problems reported by some users of this
add-in and decided to have a look at the way MFC handles object serialization. I
was right, the binary data in the "ClassView Window" stream is compatible with
the serialization support of MFC, that almost certainly was used to read and
write the stream.
Let's skip the first DWORD
of the stream, that I consider like a
signature because it never changes, but that may have a deeper meaning. The next
field is a DWORD
count of the projects in the workspace, followed
by a representation of each project. This is very close to how MFC collections
are stored during serialization. Take for example the Serialize
method of CObArray
or CObList
, that could be
conveniently used to store the list of projects in a workspace:
void CObArray::Serialize(CArchive& ar)
{
ASSERT_VALID(this);
CObject::Serialize(ar);
if (ar.IsStoring())
{
ar.WriteCount(m_nSize);
for (int i = 0; i < m_nSize; i++)
ar << m_pData[i];
}
else
{
...
}
}
Obviously this code cannot be the source of our stream, because the object
itself is first serialized and there's no trace of MFC collection classes in the
data stream before the initial count, but I hope you got the picture. First the
object count is stored in the stream, then each object in the array or list gets
stored in turn. This same pattern is used for the list of folders in a project
and the classes they contain.
Other things I discovered about ClassView, using the OpenVC Add-in and
information stored in the stream, are:
- A project is represented by an object of class
CClsFldSlob
- A folder is represented by an object of class
CClassSubfolderSlob
- The other container items (classes and the Globals folder) are
represented by objects of class
CClassViewSlob
, from which the
above two are derived
Class items and the Globals folder are both implemented as containers
of variables and functions, but as objects they are not serialized in our
stream. They are probably generated from the source code each time we open the
workspace and then associated to folders as specified by the ClassView
stream.
Project and folder items, instead, are stored exactly the same way MFC
implement serialization. When an object is serialized through
CArchive::WriteObject
, its RUNTIME_CLASS
is serialized
too, and both are written to the stream only the first time, using an index to
reference them subsequent times. This way MFC avoids to duplicate
RUNTIME_CLASS
information and handles multiple references to the
same object. The RUNTIME_CLASS
is placed before the object to
locate the CRuntimeClass::CreateObject
function when the object is
read back from the stream. Here follows a stripped down version of the
WriteObject
function:
void CArchive::WriteObject(const CObject* pOb)
{
if ((nObIndex = (DWORD)(*m_pStoreMap)[(void*)pOb]) != 0)
{
*this << (WORD)nObIndex;
}
else
{
CRuntimeClass* pClassRef = pOb->GetRuntimeClass();
WriteClass(pClassRef);
(*m_pStoreMap)[(void*)pOb] = (void*)m_nMapCount++;
((CObject*)pOb)->Serialize(*this);
}
}
The objects in our stream are unique, so let's consider the alternative
branch of the condition. The WriteClass
function writes a
WORD
prefix defined as 0xFFFF
, then it writes
information sufficient to identify the RUNTIME_CLASS
. In the same
file (ARCOBJ.CPP), which is part of the MFC 6.0 sources, we can find the
definitions of some constants that we can find in the ClassView
stream:
#define wNewClassTag ((WORD)0xFFFF)
#define wClassTag ((WORD)0x8000)
The first is the special prefix we have just seen, that precedes any new
RUNTIME_CLASS
definition. After this prefix we find the object
schema as a WORD
value (which is 0x0001
for
project items and 0x0002
for folders), and the class name as a
counted string with the character count also stored as a
WORD
. Take a look at the source, where inessential code has been
removed:
void CArchive::WriteClass(const CRuntimeClass* pClassRef)
{
if ((nClassIndex = (DWORD)(*m_pStoreMap)[(void*)pClassRef]) != 0)
{
*this << (WORD)(wClassTag | nClassIndex);
}
else
{
*this << wNewClassTag;
pClassRef->Store(*this);
(*m_pStoreMap)[(void*)pClassRef] = (void*)m_nMapCount++;
}
}
void CRuntimeClass::Store(CArchive& ar) const
{
WORD nLen = (WORD)lstrlenA(m_lpszClassName);
ar << (WORD)m_wSchema << nLen;
ar.Write(m_lpszClassName, nLen*sizeof(char));
}
As you can see RUNTIME_CLASS
information is stored only the
first time, as with multiply referenced objects, while subsequent times only an
WORD
index is stored, OR'ed with the special value
0x8000
, that we can see in the ClassView stream for projects
and folders following the first.
After class information, either complete or with just the indexed reference,
follows a list of folders for a project item and also a list of classes for a
folder item. Serialization of folders, that can be nested, follows the same
pattern of projects in a workspace, a DWORD
count followed by
serialization of each folder object in turn. Classes are stored more simply with
their DWORD
count followed by the name of each class as a
counted string, with a BYTE
length. I suspect that generic
strings like these, but not the class names as we have seen above, are stored
again using MFC provided functions and that is:
CArchive& operator<<(CArchive& ar, const CString& string)
{
if (string.GetData()->nDataLength < 255)
{
ar << (BYTE)string.GetData()->nDataLength;
}
else if (string.GetData()->nDataLength < 0xfffe)
{
ar << (BYTE)0xff;
ar << (WORD)string.GetData()->nDataLength;
}
else
{
ar << (BYTE)0xff;
ar << (WORD)0xffff;
ar << (DWORD)string.GetData()->nDataLength;
}
ar.Write(string.m_pchData, string.GetData()->nDataLength);
return ar;
}
Unfortunately, I could not verify this hypothesis since my class names were
always shorter than 255
characters and I also doubt the compiler
would accept identifiers any longer than that, but the data in the stream is
compatible with this code and it is another clue that MFC serialization support
has been used for ClassView data.
As a last note, we can see that empty lists of folders and classes are stored
with the only intial count set to zero. Indexed references to serialized classes
use an integer variable that is incremented each time a new class or a new
object is written to the stream, and the new version of this Add-in reflects
this consideration, removing the unneeded <SLOB>
tag from the
XML format I used for the stream (old files are still interpreted
correctly).
Add-In Usage
Using the add-in is pretty simple. See this page for installation
instructions if you don't already know how to install add-ins. After
installing it, the add-in is now ready to use. This is how the add-in's toolbar
looks like:
You have three buttons:
- About RestoreClassView
- Restore ClassView folders
- Save ClassView folders
They all should be self-explaining. You may choose between binary (default)
and XML format and your choice is recorded in the registry, under the key
"HKCU\Software\The Code Project\RestoreClassViewAddin"
.
If you rename a class in your project you may lose all your workspace's
folders. You can then use the XML file format to recover your folders: just
replace the class name in the XML file and reload it with the RestoreClassView
Add-in.
After all, that's what all this is meant to.
Updates
19 Jan 2004
- Changed parsing routines and XML format, with a more robust restore
operation from XML.
15 Apr 2002
- Fixed a bug: the Restore procedure was always looking for the last saved
file, regardless of the user's choice.
- I forgot a sentence in the article.
Acknowledgements
This project is based on the following people's work:
My changes for Version 2.0 of the add-in:
- Three separate toolbar buttons to access the add-in's functionalities
- XML conversion of ClassView data
Any comment or suggestion is appreciated.