Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#

Memory Model, Memory Barrier and Singleton Pattern in .NET

3.88/5 (10 votes)
17 Jun 2009CPOL6 min read 58.3K  
Explains Memory Model and ways to implement singleton pattern

Introduction

This article explains some of the concepts that every developer working on multi-proc architecture should understand. These concepts should be clear for those working on IA64 (weak memory model) or coding on multi-core programming language (Task Parallel Library).

Concepts

Re-ordering

The compiler or the processor can optimize code segments and re-order them differently than intended. For example, for the given sequence of statements:

C#
x=a; 
y=1; 

the above could instead be executed in the following order:

C#
y=1; 
x=a; 

This does not create a problem on a single core, but on a multi-core/processor it might be an issue, especially if the second thread requires the code to execute in a particular sequence. Unless we use some sort of memory-barrier, developers should not assume that the code will execute in the same sequence as it was coded. 

Memory Barrier or Fence

Memory Barrier basically means guaranteed flush and prevents re-ordering, for example if we introduce a barrier between the above two statements...

x=a; 
Barrier (Thread.MemoryBarrier() API in .NET)
y=1; 

...then any statement above or below the barrier could not cross the barrier. In other words, it is guaranteed that any statements before the barrier will remain before the barrier, and similarly the statements after the barrier will remain after it, hence no re-ordering.

The memorybarrier API (Thread.MemoryBarrier()) by default has full fence, i.e. release and acquire semantics, if you want to use only release or acquire, you can use Thread.VolatileRead or Thread.VolatileWrite API in .NET.

Full fence memory barrier ensures that the values are flushed to all CPU caches.

Cache Coherency

x86 and x64 have a strong(er) memory model, and any update/write to a cache invalidates duplicate cached instances in the other available core/processor(s); Therefore volatile keyword is redundant on x86 and x64 machines (unless you want to avoid re-ordering, see volatile explanation below).  

On IA64, cache coherency on write operation is not automatic and therefore explicit memory barriers are needed for write operations to flush to other core caches (this is where volatile comes handy).  

Note: .NET 2.0 memory model for IA64 has release semantics for write operations, but it is not CLI complaint as this is not mentioned in ECMA specs, so I am not sure if the same is implemented in monoetc (Maybe someone can confirm this?). If you want to port the code across different memory model implementations, do use memory barrier to be on the safe side.

Volatile and Locks

Volatile keyword's definition on MSDN is not exactly right, volatile (and locks for that matter) are implemented using memory barriers and therefore not only does it use release and acquire semantics (see http://msdn.microsoft.com/en-us/library/aa490209.aspx), it also does not allow re-ordering. Volatile declared objects are optimized to use read barriers for read operations and write barriers for write operations.

Store

Following is an example of store where value 5 is assigned to variable or place holder "x":

C#
x = 5; 

Load and Store

C#
lock(syncroot){x = new someObject();}

Here someObject is created and assigned to x. This could create a problem (though a very small chance) when x is accessed concurrently. This is because x might be accessed un-initialized because of a race condition where write operation could be delayed. (See here for more explanation). To fix this, do use memorybarrier right after the assignment statement on a weak memory model (see the store reordering for more explanation).

C#
lock(syncroot){x = new someObject(); Thread.MemoryBarrier();}

Store Reordering

This can be explained with a simple singleton implementation:

C#
private Singleton() {}
public Instance {
get {
  if (!initialized) {
      lock (slock) {
          if (!initialized) {
              instance = new Singleton();
              initialized = true;
              }
          }
    }
return instance;

}

On x86 and x64 writes are not reordered i.e. "initialize=true" could not come before "instance = new Singleton()" statement, if it were to happen then there might have been a race condition and other thread could see "initialized" set to true and access un-initialized "instance" object.

On IA64 MM implementation writes could be reordered, therefore it is possible that intialized=true could be executed before the "instance" initialization statement. To fix this all we need is a memory barrier so as to prevent re-ordering.

C#
instance = new Singleton();
Thread.MemoryBarrier();
initialized = true;

In the load and store example above, it might be possible that other CPU thread might see un-initialized x (this could be because of the broken memory model implementation as explained  here and here). 

Therefore if store ordering is important or if you are unsure about the underlying memory model do use memory barrier (better volatile write) after the assignment operation to sync the value with other CPUs.

Load Re-ordering

As explained by Joe Duffy, under rare circumstances (atleast theoretically) pending writes may not be seen by other processors, there might be a delay before it's made available to other processors (could this be under heavy stress ???), so again there might be a possibility that loads might be re-ordered because of this side affect (cache coherency is not 100% true and might involve a race condition). So if you really want to be sure, do use memory barriers or VolatileWrite/VolatileRead to flush the data to other caches before you proceed.

Singleton Pattern

The famous double check locking technique:

C#
if (instance == null)
{ 
    lock (object) 
   { 
        if (instance == null)
        { 
             instance = new Singleton();
             return instance; 

As explained, the above code may not work as expected on IA64 machines because there are no implicit release semantics on store/write operation on IA64. Other CPU can see "instance" not equal to null, but may contain junk value as store is not yet flushed to the CPU's cache. (In .NET 2.0 memory model the above is not true cause write operations have release semantics, but this is not CLI complaint so if you are using Mono/Rotor etc., take care and do use explicit release semantics.)

We can fix the above by making "instance" of type "volatile" but then there might be performance penalty since every time we access "instance" variable there will be a volatile read which is redundant and expensive. All we need volatile declaration is for "store" operation so it could flush the value to all the caches and prevent re-ordering.

One way to fix this is by adding memory barrier right after the load and store assignment:

C#
if (instance == null) 
{ 
    lock (object)
    {
        if (instance == null) 
        {
            instance = new Singleton();
            Thread.MemoryBarrier(); 
            return instance; 

Again a full fence memory barrier has both read and write semantics, this could further be optimized by using lazy load instantiation via the Interlocked API (see Joe's blog for more detail). The new LazyInit class in .NET 4 uses a similar implementation:

C#
public T Value {
    get {
        if (m_value == null) {
            T newValue = m_init();
            if (Interlocked.CompareExchange(ref m_value, newValue, null) != null &&
                    newValue is IDisposable) {
                ((IDisposable)newValue).Dispose();
            }
        }
        return m_value;
    }
}

Interlocked API are atomic and implemented using memory barriers thus achieving the best performance (locks are slower than Interlocked or memory barrier calls). Interlocked compareexchange is a lot faster as it is converted directly to a system call while locks are expensive (if I read correctly, it requires a kernel mode transition). Using interlocked API, the Spinlocks too are implemented.

If you are still confused with the memory model, read this.  

References

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)