Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

A Fast Mostly User Mode Inter-Process Mutex

0.00/5 (No votes)
26 Feb 2003 1  
Win32 Mutexes are slow and cumbersome, this is a fast and nifty alternative!

Results from test program using our custom mutex class

Introduction

I recently was working on a project where several resources needed to be shared among many processes. Easy! I just made use of a Win32 Mutex. This seemed fine at first, but after running test scenarios where the various processes could be locking and releasing the mutex hundreds of thousands of times simultaneously, performance became an issue. I attributed the poor performance to the Win32 mutex (for reasons I get into soon) and thus decided to write my own mutex class. This article describes the mutex class I made (actually, it's a DLL, which is a requirement, but I'll get into that later), how it works, and how it stacks up against the Win32 mutex.

The Problem with Win32 Mutexes

Win32 Mutexes are notoriously slow. Every time a mutex is created the OS must switch from "user mode" to "kernel mode", which in itself can be responsible for hundreds of CPU instructions. If you're locking a shared resource every so often, it's really not a big deal. In that case (especially if there is a network connection involved), the rest of your code is probably using much more time. However if you are constantly hitting the resource as your main information source (such as my proprietary database file example above), all this extra work Win32 mutexes have to do can be a real time hog.

The Solution

I solved the problem by writing my own mutex. It uses a combination of inter-process shared memory, Win32 critical sections (which only switch to "kernel mode" where there is a contention), and the Windows InterlockedExchange function. The kind of inter-process shared memory I'm using is the kind where you create a data segment in a binary (executable or DLL), then flag it as shared memory. The only memory I share in this data segment are two small variables (a LONG and DWORD). I rejected using memory mapped files and other forms of inter-process communication because I didn't want the performance hit I would incur by using those methods. Choosing the shared memory data segment option comes with a price: a DLL must be created to host our mutex (or you can incorporate it into an existing DLL if you don't mind such a thing). Once you've created a DLL project, statements such as those listed below need to be added to one of the files (preferably a .cpp) in the project:

#pragma data_seg(".USRMUTEX_SHARED")
volatile DWORD g_dwDSMutexOwner = 0;
volatile LONG  g_lLocker = 0;
#pragma data_seg()

#pragma comment(linker, "/section:.USRMUTEX_SHARED,rws")

The mutex object will be implemented as a C++ class I'll refer to as "the mutex class". The class and it's methods will be exports of the DLL. The mutex class essentially has only a constructor and destructor for its methods along with three member variables, all three of which are static. The first static member variable is an unsigned long whose purpose is to keep track of the number of recurring times that the same thread of the same process gained exclusive access to the mutex (another way of saying "exclusive access" would be "a lock"; what I meant by "recurring" is "locked a second or third time or so on before releasing all of it's locks"). The second static member variable holds the ID of the process that created the instance of our mutex class we're manipulating. The third static member variable is a Win32 critical section. We initialize our first two static member variables to zero and our third using InitializeCriticalSection.

Now that we've taken care of our static member variables, lets get our class prototype mocked up. Remember that we've only got a default constructor and destructor. Before you get started with the implementation, it might be a good idea if you make the default copy constructor and default copy operator of the class unreachable (put the method prototypes for those methods in a "private" section of the class but don't actually implement them). Now that we've got the class prototype, lets implement the methods. The first bit of work we want to do in the constructor is ask if we've ever set the process ID of the current process into the second static member variable we mentioned above. If that second static member variable is still set to 0, then we haven't, so get the current process ID using GetCurrentProcessId and set it into this variable. Now lock our critical section static member variable using EnterCriticalSection. By doing this we're doing two things:

  • We're ensuring that the current thread in the current process will be the only thread available to gain exclusive access to the mutex.
  • By entering the critical section we've already got half of the way to gaining exclusive access to the mutex.
This is considered "half-way" because in order to gain access to some synchronization object on an inter-process level, you must know two things:
  • which process owns it and
  • which thread in that process owns it.
By entering the critical section, we have already been able to answer the second part of that question. Now we've got to answer that first part. The first thing we have to do to figure that out is ensure that we have exclusive access to the shared memory variables. One of the shared memory variables exists for this very purpose (g_lLocker). Using that variable, we create a loop that is only broken when an the result of an InterlockedExchange call given that variable (&g_lLocker) and 1 is 0. Here is an example of how the code might look:
   while (::InterlockedExchange(&g_lLocker, 1) != 0)
   {
      Sleep(0);
   }

This says: "If I set that variable to 1 but the value of that variable was 0 before I set it, then the shared memory is safe for me to use. By setting it to 1, I'm letting others know they need to "spin" (loop until the condition is true) until I'm done with it".

The trick in this is we must make sure we set that shared memory variable back to 0 when we're finished so spinning threads can gain excusive use. So, when finished we set it back to 0 using the InterlockedExchange function as demonstrated below:

   ::InterlockedExchange(&g_lLocker, 0);

Once we have exclusive access to the shared memory, we compare the other shared memory variable (g_dwDSMutexOwner) against both 0 and that static memory variable that holds our process ID. If that shared DWORD equals either of those values then we copy the current process ID from that static member variable into our shared DWORD. Regardless if the comparison passes or fails, we need to set the g_lLocker variable back to 0 (using InterlockedExchange as described above) to give other anxious threads a chance to compare. Once the compare passes and the resulting copy occurs, the mutex will then be considered "locked" by the calling thread of that calling thread's process. At that time all we've got left to do is increment the static member variable that keeps track of the number of times the same thread in the same process has locked the mutex. If that initial comparison had failed above instead of passing (that shared DWORD was not 0 or the current process id), then we call Sleep(1) here (even though using 0 gives up the rest of the current threads time slice, I've learned that using something other that 0 plays nicer). Next we test the comparison again by looping back to the point where we get exclusive access to the shared memory using the g_lLocker variable, but not as far back as where we entered the critical section (at this time we still haven't left the critical section). Keep this loop going forever until we get a passing comparison. The very fact that the comparison fails means that some other thread either in the same or in another process already owns the mutex.

The destructor is actually much more simple. First we decrement that static member variable that keeps track of the number of times the same thread in the same process has locked the mutex. Next we check to see if that variable is zero; if so we need to get exclusive access to the shared memory variables (using g_lLocker) employing the same method described above in the constructor. Once we have exclusive access, set our shared DWORD (g_dwDSMutexOwner) to 0. This completely unlocks the current process's lock on the mutex and allows other threads to then gain access. Now release our exclusive access to the shared memory (setting g_lLocker to 0 using InterlockedExchange as described above). Lastly, regardless of whether or not our first static member variable is now zero, leave the critical section we entered in the constructor (and stayed entered). That last step allows other threads in the current process to attempt to gain exclusive access to the mutex.

The CUsrMutex Class

Here is the class definition for the class we've just created (I've called it CUsrMutex). The full class implementation is available in the zip file attached to this article.

//This is a class we made for CUsrMutex to consume, it makes the win32

// critical sections easier to use

class CCritSec
{
protected:
   CCritSec(const CCritSec& src);
   const CCritSec& operator=(const CCritSec& src);

public:
   CCritSec();
   virtual ~CCritSec();
   operator LPCRITICAL_SECTION();
   bool Lock();
   bool Unlock();
   
protected:
   mutable CRITICAL_SECTION m_sect;
};

//This class makes for easy locking of our g_lLocker shared memory variable

class CUsrMutexLockerWkr
{
private:
   CUsrMutexLockerWkr(const CUsrMutexLockerWkr& src);
   const CUsrMutexLockerWkr& operator=(const CUsrMutexLockerWkr& src);

public:
   CUsrMutexLockerWkr();
   ~CUsrMutexLockerWkr();
};

class CUsrMutex
{
protected:
   CUsrMutex(const CUsrMutex& src);
   const CUsrMutex& operator=(const CUsrMutex& src);

public:
   CUsrMutex();
   ~CUsrMutex();

protected:
   static unsigned long m_ulLockCnt;
   static DWORD         m_dwProcessId;
   static CCritSec      m_dscs;
};

Here is the implementation of these three classes:

#pragma data_seg(".USRMUTEX_SHARED")
volatile DWORD g_dwDSMutexOwner = 0;
volatile LONG  g_lLocker = 0;
#pragma data_seg()

#pragma comment(linker, "/section:.USRMUTEX_SHARED,rws")

unsigned long CUsrMutex::m_ulLockCnt(0);
DWORD         CUsrMutex::m_dwProcessId(0);
CCritSec      CUsrMutex::m_dscs;

CCritSec::CCritSec() :
   m_sect()
{
   ::InitializeCriticalSection(&m_sect);
}

CCritSec::~CCritSec()
{
   ::DeleteCriticalSection(&m_sect);
}

CCritSec::operator LPCRITICAL_SECTION()
{
   return (LPCRITICAL_SECTION)&m_sect;
}

bool CCritSec::Lock()
{
   ::EnterCriticalSection(&m_sect);
   return TRUE;
}

bool CCritSec::Unlock()
{
   ::LeaveCriticalSection(&m_sect);
   return TRUE;
}

CUsrMutexLockerWkr::CUsrMutexLockerWkr()
{
   while (::InterlockedExchange(&g_lLocker, 1) != 0)
   {
      Sleep(0);
   }
}

CUsrMutexLockerWkr::~CUsrMutexLockerWkr()
{
   ::InterlockedExchange(&g_lLocker, 0);
}

CUsrMutex::CUsrMutex()
{
   if (m_dwProcessId == 0)
   {
      assert(sizeof(LONG) == sizeof(DWORD));
      m_dwProcessId = ::GetCurrentProcessId();
      assert(m_dwProcessId != 0);
   }

   m_dscs.Lock();

   for (;;)
   {
      {
         CUsrMutexLockerWkr locker;
         if ((g_dwDSMutexOwner == 0) ||
             (g_dwDSMutexOwner == m_dwProcessId))
         {
            g_dwDSMutexOwner = m_dwProcessId;
            break;
         }
      }
      Sleep(1);
   }

   ++m_ulLockCnt;
}

CUsrMutex::~CUsrMutex()
{
   if (--m_ulLockCnt == 0)
   {
      CUsrMutexLockerWkr locker;
      assert(g_dwDSMutexOwner == m_dwProcessId);
      g_dwDSMutexOwner = 0;
   }

   m_dscs.Unlock();
}

Proving Performance

Once I got the class I described above implemented, I wanted to test it to see if it was really faster than win32 mutex. I ran lots of tests, and my class always had the edge. The test where it really shined is where 6 separate processes were competing for the mutex. These 6 processes first attempted to lock and unlock a Win32 mutex 100,000 times, then attempted to lock and unlock my mutex 100,000 times. The total times were similar for each process, each something like the following numbers:

Win32 mutex: Took 6800 milliseconds to lock and unlock the mutex
My mutex: Took 240 milliseconds to lock and unlock the mutex

Along with this article a zip archive is included that includes my implementation of the mutex, a sample application, and several performance and comparison test applications.

Limitations

This design has two major limitations: 1) It's required to live in a DLL (I suppose you could put it directly into an executable if that met your specific needs), and 2) You only get one mutex you can share. The second limitation is only a limitation of my specific implementation. You can make changes in your implementation so you would have access to more than one of these mutexes.

Conclusion

This solution met my needs and vastly improved the performance of our product. Although this implementation of a custom mutex class might not meet your needs, I imagine (and hope) that this has been a good resource to help you get started in building your own custom synchronization objects. Good luck!

History

11/23/2002  -  Initial revision.
02/18/2003  -  Fixed possibility of race condition by adding g_lLocker.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here