Introduction
I've been using threads a long time in .NET and honestly the topic interests me very much. When I began to get deeper into multithreading I came to use other threading constructs rather than just the Monitor (lock) and Interlocked classes. Interlocked and volatile mechanisms are user mode constructs meaning that they do not transition into the kernel. Kernel level objects like Mutex and Semaphores are created and managed by the operating system and are system wide. If we use a semaphore in our application we must leave user mode and transition into kernel mode just to create it. There is a slight performance hit that we incur since the kernel does not trust us. While this hit is negligible it can begin to cost a lot when used in different types of scenarios.
Background
I like the functionality that the semaphore provides however in some cases I would like to use it in just one process and not transition into kernel mode and take a hit. I've decided to write a very simple non-kernel semaphone.
What it has:
- Fast creation time since it just a regular .Net non-kernel object
- Only transitions to kernal mode when the maximum capacity has been met and more threads want to use it.
What it lacks:
- It's not system wide which makes sense because it 'mostly' stays in user mode.
- It does use Monitor which can transition into kernel mode, however this only occurs when all the slots have been filled.
The code
public class NonKernalSemaphore
{
private readonly object padlock = new object();
private readonly int maxSlots;
private int usedSlots;
public NonKernalSemaphore(int maxSlots)
{
this.maxSlots = maxSlots;
}
public void Enter()
{
lock (padlock)
{
while (usedSlots == maxSlots)
{
Monitor.Wait(padlock);
}
usedSlots++;
}
}
public void Release()
{
lock (padlock)
{
if (usedSlots > 0)
{
usedSlots--;
Monitor.Pulse(padlock);
}
}
}
}
I wanted to keep this class as simple as possible without adding fluff so that I could cater for the basic usage scenario.
NonKernelSemaphore
takes the maximum number of threads allowed at one time as constructor parameter maxSlots
. It also keeps a count of how many threads are currently using the semaphore using the usedSlots
field.
Enter Method Explanation
When a consumer calls the Enter()
method a lock, padlock
, is first accquired. We now check if there are any slots remaining in the semaphore. Obviously if usedSlots
is equal to the maxSlots
then all the slots are occupied. In this case we wait until another thread releases the slot by calling the Release()
method. By calling Monitor.Wait()
we transition into kernel mode and let the thread scheduler take care of things. Our scope at this point is still only process wide.
Release Method Explanation
Again in this method we must first call lock on the shared padlock
object. We now decrease the usedSlots
count which releases a slot in the semaphore. Then we awaken all the threads who are waiting on a slot by calling Monitor.PulseAll()
. When the threads wake up they must first check if all the slots are used again hence the while condition (usedSlots == maxSlots).
Optimization
One optimization I was considering is only calling Monitor.Pulse()
only if there are threads waiting for an available slot. This is definitely better than blindly calling the method. In this case we just need to keep track of how much many threads are waiting by introducing a new variable, waitingCount
of type int
incrementing it whenever a thread is waiting. The Release()
method will then check if there are threads waiting. If there is then call Monitor.Pulse()
otherwise just decrease usedSlots
as usual.
Here is the code:
public class NonKernalSemaphore
{
private readonly object padlock = new object();
private readonly int maxSlots;
private int usedSlots;
private int waitingCount;
public NonKernalSemaphore(int maxSlots)
{
this.maxSlots = maxSlots;
}
public void Enter()
{
lock (padlock)
{
if (usedSlots == maxSlots)
{
waitingCount++;
do
{
Monitor.Wait(padlock);
} while (usedSlots == maxSlots);
waitingCount--;
}
usedSlots++;
}
}
public void Release()
{
lock (padlock)
{
if (usedSlots > 0)
{
usedSlots--;
if (waitingCount > 0)
{
Monitor.Pulse(padlock);
}
}
}
}
}
Updates
Originally I used Monitor.PulseAll()
however a Code Project reader ran some tests found that Pulse()
was a bit faster.
Tests
Machine: Windows 7 Professional (64), Intel Xeon(R) 2.4 Ghz, 12 GB RAM, .Net 4.0, Release Mode
Test 1
Creation Time (Milliseconds) after 1 million iterations:
NonKernelSemaphore: 76
Semaphore: 2263
SemaphoreSlim: 623
Test 2
Enter/Wait/Release Time
Max concurrent slots: 8
Number of threads: 64
My test procedure included something like:
public static void DoNonKernel()
{
nks.Enter();
nksc++;
nks.Release();
}
public static void DoSemaphore()
{
s.WaitOne();
sc++;
s.Release(1);
}
public static void DoSemaphoreSlim()
{
sl.Wait();
slc++;
sl.Release();
}
I ran this a few times. Here are the results in milliseconds:
SemaphoreSlim:453.0453
Semaphore:241.0241
NonKernelSemaphore:249.0249
SemaphoreSlim:288.0288
Semaphore:249.0249
NonKernelSemaphore:249.0249
SemaphoreSlim:278.0278
Semaphore:264.0264
NonKernelSemaphore:285.0285
SemaphoreSlim:235.0235
Semaphore:273.0273
NonKernelSemaphore:246.0246
SemaphoreSlim:226.0226
Semaphore:237.0237
NonKernelSemaphore:217.0217
SemaphoreSlim:228.0228
Semaphore:224.0224
NonKernelSemaphore:381.0381
Points Of Interest
Feedbacks welcomed!