Introduction
BlackSabbath
What does this application do? This application takes a
URL and crawls it to a specified depth and finds broken links in a web page.
This means if the depth is 2 the tool will open all the links of the parent
page and in a child link analyze its entire child links too. Can be very
helpful for web authoring to find incorrect, moved, broken links. One fact I
want to admit is that the short algorithm section CHTTPThread ::
OpenNewLinkandGetData()
which determines if a link is broken is very raw. It
finds lots of broken links all though they are not, the reason the parent
directory association not correct. I sure can use help from some of you HTML
gurus to exact it and don't have time to do it. Application is really intended
to bring out several aspects of threading and COM and how to create a multi
threaded COM client project effectively.
The architecture goes as follows. We have two work queues.
CThreadSafePtrArray
arrLinksDataG, which is an array of
links, we have to analyze.
CThreadSafePtrArray
arrHTMLDataG which is an array of HTML
data for a page whose links have to be analyzed.
There are two thread arrays
CThreadSafePtrArray
HTTPThreadsG is an array of threads,
which will do HTTP transactions to get the HTML for web pages. The links
are obtained from arrLinksDataG and the resultant HTML if any would be
added to arrHTMLDataG. If a link is determined to be broken, it is added
to the display as well as an XML BlackSabbath.xml file. Any Thread can be
signaled to be killed using a member variable kill event, hKillEventM.
Otherwise the thread kills itself at maximum idle time which starts when
there is no work indicated by empty arrLinksDataG.
CThreadSafePtrArray
HTMLParserThreadsG is an array of
threads which will do HTML analysis for HTML from arrHTMLDataG and iterate
through the links, adds them to arrLinksDataG. Any Thread can be signaled
to be killed using a member variable kill event hKillEventM. Otherwise the
thread kills itself at maximum idle time which starts when there is no
work indicated by empty arrHTMLDataG.
There is a worker thread, which will act as thread manager. The main
application thread at the start of the job creates this thread. The function is
utils.cpp showBusy()
The URL to be analyzed will be written in to link data
array before calling showBusy()
. Thread creation works on the logic that if the
workload is more than twice the number of particular threads available and if
we haven't reached the maximum allowable limit on thread, then create a new
thread. Thread manager function is utils.cpp checkBusy ()
. Thread creation is
managed by thread manager calling utils.cpp checkLoadAndCreateThreadIfNeeded()
and this function will create the first HTTP thread. Workload is measured by
the item count in the work queue. Newly created threads are added to the
respective thread array. Thread manager is killed in following way.
- When the user requests an abort operation from GUI and
main thread initiates termination of all threads.
- By self-idle counting when there is no work left in the
links or HTML data array and there are no worker threads.
Creating thread safe MFC classes
Here is an example of creating a thread safe MFC derived class. MFC classes
are not thread safe. It can be easily achieved by deriving a general class out
of it and exposing the methods you are planning on using. N.B Use
protected or private access specifiers unless you plan to expose all methods
exposed by the base class. Else it would create problems with an unsuspicious
client using not thread safe functions from the base class.
class CThreadSafePtrArray : protected CPtrArray
{
private:
CRITICAL_SECTION myLockM;
public:
CThreadSafePtrArray();
~CThreadSafePtrArray();
void Add(LPVOID newElementA);
LPVOID GetAt(int iIndexA);
int GetSize();
void Lock(){EnterCriticalSection(&myLockM);}
void UnLock(){LeaveCriticalSection(&myLockM);}
LPVOID RemoveHead();
LPVOID RemoveAt(int iIndex);
};
CThreadSafePtrArray::CThreadSafePtrArray()
{
InitializeCriticalSection(&myLockM);
}
CThreadSafePtrArray::~CThreadSafePtrArray()
{
DeleteCriticalSection(&myLockM);
}
LPVOID CThreadSafePtrArray::RemoveHead()
{
FUNCENTRY("CThreadSafePtrArray::RemoveHead");
EnterCriticalSection(&myLockM);
LPVOID pRet = CThreadSafePtrArray::RemoveAt(0);
LeaveCriticalSection(&myLockM);
FUNCRETURN(pRet);
}
LPVOID CThreadSafePtrArray::RemoveAt(int iIndex)
{
FUNCENTRY("CThreadSafePtrArray::RemoveAt");
LPVOID pRet = NULL;
EnterCriticalSection(&myLockM);
if( (iIndex < CPtrArray::GetSize()) &&
(iIndex >= 0))
{
pRet = CPtrArray::GetAt(iIndex);
CPtrArray::RemoveAt(iIndex);
}
LeaveCriticalSection(&myLockM);
FUNCRETURN(pRet);
}
You can see that extra is only couple of lines, one in the beginning and one
in the end. EnterCriticalSection(&myLockM)
,
LeaveCriticalSection(&myLockM)
. EnterCriticalSection
acquires lock on our array
for unique access and all other threads trying to access will have to wait till
the current thread does LeaveCriticalSection
. This approach has its advantage
than using traditional CPtrArray
and a synchronization object to secure it.
- Reuse
- Provides a method to automatically lock consistently
whenever the underlying data is used and reduces the amount of code
required for the same.
- Provides functionality to Explicit Lock/UnLock when a
block of thread safe operations has to be done under one lock.
What will be a situation to use Explicit Lock/UnLock? Say I call GetSize()
and start operating on a array by getting elements one by one. This is not
going to be kosher. Let's see some code. Say while I am in the loop from
Thread1, Thread2 gets control and removes from the array and Thread1 is
operating on the last element in the array. This will result in an access
violation. But the following piece of code using Lock/UnLock will stay strong.
CThreadSafePtrArray HTTPThreadsG;
HTTPThreadsG.Lock();
int iNoOfThreads = HTTPThreadsG.GetSize();
for(int iCurThread = 0; iCurThread < iNoOfThreads; iCurThread++)
{
CHTTPThread* pCurThread = (CHTTPThread*) HTTPThreadsG.GetAt(iCurThread);
.
.
}
HTTPThreadsG.UnLock();
Trying to acquire Lock from same thread twice
What might bother you here is that some member functions in turn call other
member functions. All of them Lock and unlock. How would this work. To avoid
dead locks all the synchronization objects except CEvent
would allow a lock if
a thread already has a lock. CEvent
would however stay put in it's said state
though because it's prime use is as a signaling object. Hence the above piece
of code will be fine although it looks like RemoveHead()
will acquire the lock
twice .
Lock a resource only as long as you need
One important principle in multi threading is to lock a resource for the
minimum number of instructions you can. I can demonstrate a situation using
CThreadSafePtrArray
implicit lock usage. Look at following code.
CThreadSafePtrArray HTTPThreadsG;
CThreadSafePtrArray HTMLParserThreadsG;
CThreadSafePtrArray arrLinksDataG;
CThreadSafePtrArray arrHTMLDataG;
if( (0 < HTTPThreadsG.GetSize())
(0 < HTMLParserThreadsG.GetSize()
(0 < arrLinksDataG.GetSize())
(0 < arrHTMLDataG.GetSize()))
{
}
Each variable is locked and released after determining the array size. No
two resources are locked at the same time. If we where to use MFC data types we
will usually lock all of them at once. The code follows
EnterCriticalSection(HTTPThreadsLockG);
EnterCriticalSection(HTMLParserThreadsLockG);
EnterCriticalSection(arrLinksDataLockG);
EnterCriticalSection(arrHTMLDataLockG);
This will result in holding up all the resources at once till the end of the
above conditional if. Now if you closely see the said variables, all the
threads except main thread operates on above variables and Thread manager needs
to do the above if condition to determine the work load. This will take a major
performance hit. Say if the thread switching happens while thread manager is on
this if condition, no other thread will be able to do its job and will have to
switch back finally to thread manager and finish up. This was an effort to
convince you to create thread safe data type classes always.
Declaring thread Local Variables for C style worker threads
If we wanted to create a variable shared by several functions used by a
worker thread but has to have a unique value per thread, we should use _declspec(thread)
static. This is needed only for normal C style implementation of worker
threads. Use of C++ style CWinThread
derived classes achieve this by having
member variables and member functions accessing these variables and of course
there is an object per thread. I will show an example from my application files
utils.cpp and utils.h. There is no real need to use this as I have only one
thread manager thread, the purpose is to illustrate use.
_declspec(thread) static UINT lIdleCount = 0;
BOOL checkIdleMax()
{
BOOL bIdleMax = false;
if(MAXIDLECOUNT <= lIdleCount)
{
bIdleMax = true;
}
return bIdleMax;
}
void incrementIdle()
{
ASSERT(MAXIDLECOUNT > lIdleCount);
lIdleCount++;
}
void clearIdle()
{
lIdleCount = 0;
}
Application Configuration file for Multi threaded project
="1.0" ="UTF-8"
<ApplicationConfig>
<Debug>
<AppLogging>
<BrokenLinksXMLFileName>
C:\MyProjects\BlackSabbath\BlackSabbath.xml
</BrokenLinksXMLFileName>
<BrokenLinksXMLSchemaFileName>BlackSabbathSchema.xml
</BrokenLinksXMLSchemaFileName>
<BrokenLinksXMLXSLTFileName>BlackSabbath.xsl
</BrokenLinksXMLXSLTFileName>
<LogFileName>
C:\MyProjects\BlackSabbath\BlackSabbath.log
</LogFileName>
<CurrentLoggingLevel>1</CurrentLoggingLevel>
</AppLogging>
</Debug>
<Release>
<AppLogging>
<BrokenLinksXMLFileName>
C:\MyProjects\BlackSabbath\BlackSabbath.xml
</BrokenLinksXMLFileName>
<BrokenLinksXMLSchemaFileName>BlackSabbathSchema.xml
</BrokenLinksXMLSchemaFileName>
<BrokenLinksXMLXSLTFileName>BlackSabbath.xsl
</BrokenLinksXMLXSLTFileName>
<LogFileName>
C:\MyProjects\BlackSabbath\BlackSabbath.log
</LogFileName>
<CurrentLoggingLevel>1
</CurrentLoggingLevel>
</AppLogging>
</Release>
</ApplicationConfig>
The above said file is BlackSabbathConfig.xml. It has configuration settings
for debug and release builds. Now the task is to write a reader class that is
thread safe to access the configuration values. The implementation regarding
thread safety is same as before using synchronization objects. The class name
is CAppConfigReader
and it is a singleton class, which has to be created and
released once per application. The first call of GetInstance
should be with the
name of the configuration file.
CAppConfigReader* pConfig =
CAppConfigReader::GetInstance(
"C:\\MyProjects\\BlackSabbath\\BlackSabbathConfig.xml");
if(NULL != pConfig)
{
csLogFileName = pConfig->GetValue(
CString("/AppLogging/LogFileName"));
}
Every subsequent call should not provide the file name, as it will result in
reopening and loading the file or changing the file. The reason is because the
implementation is singleton as there will be only one and only one application
configuration file. If more configuration file reads needed, please change
implementation from singleton to normal creation.
CAppConfigReader* pConfig = CAppConfigReader::GetInstance();
if(NULL != pConfig)
{
csLogLevel = pConfig->GetValue(
("/AppLogging/CurrentLoggingLevel"));
logLevelM = (eLoggingLevel)atoi(csLogLevel);
}
When you say /AppLogging/LogFileName to pConfig->GetValue
, the function
internally prefix Debug or Release as the build and makes it //Debug/AppLogging/LogFileName
or //Release/AppLogging/LogFileName.
Log file for Multi threaded projects
One of the main debugging/trouble shooting tools for multi threading
projects is a log file of function entry/exit and diagnostic purpose data
logging ad error logging. Logs have to be thread safe for this purpose as
multiple threads can access the log file simultaneously. I have couple of
macros to make the usage easy. I will illustrate the usage of these macros with
examples. They create a text file in the path mentioned in application
configuration file BlackSabbathConfig.xml.
When the application starts the logging objects has to be created as below
CLogFile::Create();
and when the application exits the logging objects has to be deleted as
follows.
CLogFile::Release();
The macros are declared in LoggingAPI.h and the usage is as follows.
FUNCENTRY
Usage:
FUNCENTRY("CHTTPThread::InitInstance");
Use this at the start of the
fn.
FUNCEND
Usage: FUNCEND;
Use this at the end of the fn.
FUNCRETURN
Usage: FUNCRETURN(FALSE);
Use this at the end of
the fn.
INFOLOG
Usage: INFOLOG(("Thread ID: %d
\n",pThread->m_nThreadID));
INFOLOG(("Thread Creation Succeeded
\n"));
Note: The usage is like CString::Format()
with
arguments put in a ( ). The message is prefixed with Data:
THREADDATALOG
Usage: Same as INFOLOG
, Just a different name for
clarity.
LOGWARNING
Usage: Same as INFOLOG
. However the message is
prefixed with Warning:
LOGERROR
Usage: Same as INFOLOG
. However the message is
prefixed with Error:
LOGFATALERROR
Usage: Same as INFOLOG
. However the message is
prefixed with Fatal Error:
Setting Error logging level
Using the configuration file you can set the level of error logging. The key
is in configuration file as follows
<CurrentLoggingLevel>1</CurrentLoggingLevel>
The values mean the following
- Function Level = 0
- Info Level = 1
- Warning Level = 2
- Error Level = 3
- Fatal Level = 4
CurrentLoggingLevel = 1 in configuration means all the log messages above 1
and 1 will be logged. That means Function Level logging will be ignored.
Sample Log Entry
14:09:13.219 Thread(1540) Entering CMRUCombo::AddToMRUList
File(c:\myprojects\blacksabbath\mrucombo.cpp) Line(56)
14:09:13.219 Thread(1540) Exiting CMRUCombo::AddToMRUList
File(c:\myprojects\blacksabbath\mrucombo.cpp) Line(80)
The above means a normal function execution.
14:24:21.305 Thread(1516) Abnormally exiting CBlackSabbathApp:InitInstance
File(c:\myprojects\blacksabbath\blacksabbath.cpp)
This is not pleasant. Can be because of two reasons.
- You forgot to terminate the function with FUNCEND or
FUNCRETURN.
- There was an unhandled exception and the control was lost
from the function. This demonstrates one more advantage of logging, detecting
exceptions.
Note: However the logging is according to your discretion as to what
you redeem important and what you don't. Type of functions to avoid from
logging are UpdateCommandUI
, PreTranslateMsg
, OnPaint
,
OnDraw
, OnTimer
,
CArray::GetAt
derived handlers. The reason is that they are frequently called
functions and will explode your log file and make it less readable.
COM interfaces and Threading models
Initializing COM for threads
COM has to be initialized per thread, otherwise all COM calls will return
RPC_E_THREAD_NOT_INIT
. Initialization can be done only once and if
reinitialized with different threading model CoInitialize()
or
CoInitializeEx()
will return RPC_E_CHANGED_MODE
. Also a notable point is that the initialization
works like a normal COM interface, it reference counts. So you will have to
CoUnitialize()
as many times as you initialized. If you don't, it won't be a
major problem as they are references to underlying COM library dll's.
Initialization from a DLL
COM creation in a dll has to assume one thing - The client of the DLL will
initialize COM and that dll has to run on that threading model. Otherwise
initialization errors as explained in the previous section will arise -
Duplicate Initialization or Reinitialization with different threading model.
CoUnitialize() and application Lockup
Reason is CoUnitialize()
internally creates a modal message loop and waits
indefinitely till all the COM messages are cleared. This can create problems
when COM components in DLLS are unloaded without checking if it is ready to be
unloaded. It generally happens when a dynamically loaded DLL hosts a COM
interface. DLL will run from an application and uses applications message queue
for synchronization purposes, if the model is Apartment threading. Please be
advised not to access COM interfaces in dynamically loaded DLL's. I don't have
a better solution. But if you don't care anyways, remove CoUnitialize()
.
Marshalling across threads
COM interfaces are thread affinitive. What does this statement mean? COM
interfaces belong to the thread that created and lives and dies with them. What
if other threads needed to access this interface. The parent has to marshall
this interface to the other thread. What will happen if we just used this
interface from a thread safe global variable? Everything will be fine, but when
we access a method or property of this interface, COM will throw a
RPC_E_WRONG_THREAD
error. One of the standard functions available for
marshalling purpose is CoMarshalThreadInterfaceInStream()
and CoGetinterfaceAndReleaseStream()
. These are old functions; we have an easy way
to do this. IGlobalInterfaceTable
referred to as GIT. This interface internally
implements marshalling between threads. Here is the usage from
CBrokenLinksFile
, which is an XML file writer. Multiple threads use
CBrokenLinksFile
to write broken links to disk in XML format.
Declaring and Creating GIT
The following code is run by thread manager thread.
#include "atlbase.h"
CComGITPtr<MSXML2::IXMLDOMDocument2> CBrokenLinksFile::xmlDocGITM;
MSXML2::IXMLDOMDocument2Ptr pXMLDoc = NULL;
HRESULT hrDoc = CreateInstance(pXMLDoc,CLSID_DOMDocument2);
if(SUCCEEDED(hrDoc))
{
xmlDocGITM = pXMLDoc.GetInterfacePtr();
}
Accessing interface from GIT
The following code is run by CHTTPThread
MSXML2::IXMLDOMDocument2* pXMLDoc = NULL;
HRESULT hrGITAccess = xmlDocGITM.CopyTo(&pXMLDoc);
if(SUCCEEDED(hrGITAccess))
{
}
Revoking interface from GIT
The following code is run by thread manager thread.
xmlDocGITM.Revoke();
Note: Only the thread that Created/Registered the interface can
revoke it. This implies that the thread that created has to stay alive till
revoked. Yes that's true. When the thread who created it dies, the interface
also becomes invalid.
Which thread executes my COM interface and Dead Locks?
This is a very interesting aspect, which you usually overlook. In this
question often lies the answer why sometimes your application dead locks. The
answer is that it depends on your threading model. Lets discuss the threading models
- Apartment and Free threading.
How to determine the threading model of a component
You will have to look under HKEY_CLASSES_ROOT/CLSID WITH either the Class ID
or the name of component .dll or .exe. Here is an Example of what WebBrowser2
components registration says in Regedit, the binary for the component is
shdocvw.dll.
Looking under HKEY_CLASSES_ROOT/CLSID
{0A89A860-D7B1-11CE-8350-444553540000}
InProvServer32
Default - REG_EXPAND_SZ -
%SystemRoot%\System32\shdocvw.dll
ThreadingModel - Apartment
COM Object creation failure and Incorrect threading model
Look carefully the above registry entry. It means WebBrowser2 control can be
created from an Apartment Initialized thread only. I had to find it the hard
way and the above described is the way I found it's threading model. The
problem I faced was that my InitDialog()
fn will fail in CDialog::CreateDialogControls()
fn. A common scene when a programmer forgets to
call one of OleInitialize(), CoInitialize(), CoInitializeEx(
COINIT_APARTMENTTHREADED )
. But my point is CoInitializeEx
(COINIT_MULTITHREADED)
will not work to fix this issue. It would have worked if
component registry said ThreadingModel - Free or Both. Both means the object is
safe for creation under all threading models. However it is not possible to
change the above said registry values manually and get it working. This will
result in unpredictable results although temporarily InitDialog()
might
succeed. I will explain.
What happens if threading model is changed manually in registry?
When an interface is created in Apartment threading, COM fn call
CoInitialize()
or any equivalent creates a message loop and uses it to
synchronize the COM calls to any COM object created in that thread. The call
will be executed in the order it is queued. Why is message queue used to queue
on components parent thread? This will assure that only one call will execute
at a given time and there is no need of making the COM object thread safe.
Apartment COM threading model is provided to accomplish thread safe COM components
by just marking the ThreadingModel - Apartment in registry. Apartment threading
fails to exploit the advantages of multi threading because calls will have to
switch to the COM interfaces parent thread and this will potentially be an
overhead for every thread using those interfaces. So for complex COM objects
usually free threading is supported. This necessarily means the underlying COM
object implementation is thread safe. The COM Interface method call will be
executed in the calling thread. In a multi threaded project there will be
contention for the said COM object resources. But not to worry as object is
thread safe. So if you changed in registry the threading model and made an
Apartment threaded component Free threaded and used it in a multi threaded
project, what will happen ? If there is thread contention for any shared
resources in the COM object, the results will be unpredictable, as the
component is not designed thread safe.
Dead Lock
In apartment threading the thread that created the COM object will be
executing the calls to the particular interface. Say Thread1 created Component1
and is Apartment threaded. Thread1 calls a function, which has to wait for
Thread2 to finish. For Thread2 to finish it needs Component1 method/property
call. This will result in a deadlock, as the call can execute only on Thread1.
Both threads will be waiting on each other. What are possible solutions?
- Initialize Thread1
CoInitializeEx(COINIT_MULTITHREADED)
. This
can be done only if the component allows this threading model. Else step 2
- Create the interface on another thread without potential
dead lock condition.
Too see a situation like this you can change the threading model of main
thread in file BlackSabbathDlg.cpp to apartment threading. Click the Crawl
button. Once it starts writing links click Stop Crawling button. The
application will deadlock. This is because main thread tries to kill all
threads by calling ClearBusy()
. The XMLDoc object is created by main thread.
Main Thread will wait on the handle of CHTTPThread
in CHTTPThread
::
KillThread()
as a part of clearBusy()
call. Meanwhile
CHTTPThread
will try to
access the XMLDoc object to write the brokenlink. This function has to run on
main thread and it is a dead lock, threads waiting for each other at the same
time.
How to Detect Dead Locks and Application Locks/Infinite loops
Microsoft distributes for free WinDbg which is a debugging tool with
extensive capabilities.
Here is a tutorial on this tool. (Microsoft Knowledge Base
Article - 311503). It can be downloaded from the following location http://www.microsoft.com/whdc/ddk/debugging/.
Either you can use the Microsoft symbol server or the above article gives
explanation to add symbol path or you can download the symbol files and install
it on your hard drive. Here is how to get the NT symbol files on your hard disk
-
Microsoft Knowledge Base Article - 148659. Start WinDbg and select File Menu
and choose Symbol File Path .... Set the symbol file path including your
application PDB or DBG file directory. System32 directory by usually contains
symbols for many dll's like mfc42d.dll, msvcrtd.dll etc. Select the Reload
check box and this will reload from new symbol files. Q121366:
INFO: PDB and DBG Files - What They Are and How They Work describes about
symbol files. Also please find interesting articles at Microsoft
Debugging Tools Knowledge Base Articles.
To enable generation of debugging info/Symbols (.pdb file) in a
release/debugging build of Visual C++ 6.0 development environment, follow these
steps: On the Project menu, click Settings. Click Win32 Release configuration.
On the C/C++ tab, click General, and then set the following: Set Optimizations
to Maximize Speed or to Minimize Size. Set Debug Info to Program Database. On
the Link tab, click General, and then set the following: Make sure that you
click to select Generate debug info. Make sure that you clear the Link
incrementally check box. Edit the Project options directly, and then add
/opt:ref,icf. To use Microsoft Format debugging information, select the
Microsoft Format or Both option button under Link/Debug Debug Info. If the Generate
Debug Info check box is not selected, this choice is
unavailable. On the command line, if /.DEBUG
is specified, the default type is /DEBUGTYPE:CV; if /DEBUG is not
specified, /DEBUGTYPE is ignored.
Now when your application hangs you have to run an application called
UserDump.exe. This will create a dump file to capture the state of the
application. UserDump.exe is a stand alone executable that can operate on
standard windows installation. Please see the following article to see how to
obtain a dump file How to
Use Userdump.exe to Capture the State of the Information Store. Let's
assume your dump file is named C:\BlackSabbath.dmp.
Now start WinDbg from your programs. From file menu select "Open Crash
Dump", Navigate to the above file and open it. From View, select Processes
and threads and Call Stack. This will tell you where your program is at and if
you double click on the call stack lines the corresponding code file will open.
This will happen only if you have set up the symbols correctly. For deadlock
detection, select menu item View Command and the command window will come up.
From the command line, type !deadlock and it will report dead locks if any. If
not checking call stack for each thread is your best bet, application might be
in an infinite loop.
Some useful commands to type on the command line is
- .reload - Reloads the symbols
- !locks - Reports all the locked synch objects
- !deadlock - detects the dead lock
- .time - time details for the process
- .sysmpath - sets the symbol file path
- !process - Process information
- .thread <thread ID> - thread information. Go through
help to find more.