|
The article mentions the research in a 1996 paper by Maged M. Michael and Michael L. Scott on non-blocking and blocking concurrent queue algorithms, and states that ...
In fact, their queue implementation is now part of the Java concurrency library.
Where are you getting this fact from? It frankly does not appear defensible. The principal author/architect of the Java concurency library is Doug Lea at Oswego (see http://g.oswego.edu/dl/classes/EDU/oswego/cs/dl/util/concurrent/intro.html[^]) not either of Maged Michael or Michael Scott.
It's true that Micheal Scott won the 2006 "Edsger W. Dijkstra Prize in Distributed Computing", for work he did in a 1991 paper which describes a mutual exclusion lock based on a queue. The 1991 paper is different from the 1996 paper that you cited. It's also true that his mutual exclusion lock forms the basis for monitor locks used in Java VMs. See http://www.podc.org/dijkstra/2006.html[^].
But a mutual exclusion lock is very clearly a locking technique, not a lock-free technique. And the paper you cited is not the one that forms the basis for monitor locks in Java.
So, I don't understand the statement in the article, to the effect that the 1996 paper is somehow the basis for the entirety of the Java concurrency library.
Please explain further.
PS: Professor Scott is a remarkable person with many achievements, which you can see from his University of Rochester website: http://www.cs.rochester.edu/~scott/[^]. With my question above, I intend no disrespect for him or any of his accomplishments. I simply seek justification for a fairly sweeping comment.
|
|
|
|
|
I cannot find my original research which lead me to believe that they themselves wrote the queue implementation in the concurrency library. I've been busy since then. However this is what the concurrency library page said: "This implementation employs an efficient "wait-free" algorithm based on one described in Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms by Maged M. Michael and Michael L. Scott." Re-reading what I said, I do not understand how you would conclude that I was saying the paper was the basis for the entirety of the Java concurrency library. Your quote of what I said explains pretty clearly what I believed "..their queue implementation is now part of the Java concurrency library.:
Idaho Edokpayi
|
|
|
|
|
In the ArrayQ::Resize() function (C++ file), what is the purpose of the CRITICAL_SECTION?
It's being created on the stack, inside the local scope of the Resize() function. Thus, every single thread that calls the function will get its own copy of a different critical section. As such, it cannot serve any possible thread-synchronization or data-protection functions.
Can you shed any light on this?
Mike
|
|
|
|
|
I've since revised the code and removed the ArrayQ class since I believed the queue based on the paper was a stronger and more defensible implementation. I actually don't recommend the ArrayQ class. I am also planning a fix for the performance problem in the C# queue.
Idaho Edokpayi
|
|
|
|
|
You avoid to respond the question. The question is not about if ArrayQ is stronger or not than other implementation, the question is about a "conceptual mistake" in the implementation of ArrayQ. Why not simple accept an error?
|
|
|
|
|
Looking back, (way back) the ArrayQ wasn't that great an implementation. I noticed the error and decided that the other algorithm was more likely to produce the results I was seeking, so it would have been a waste of my time and the reader's time to fix code that shouldn't have been used at all. I acknowledge that there are weak points still. My intention is to fix them but I don't have the time lately.
Idaho Edokpayi
|
|
|
|
|
It works, however performance suffers because of the consistancy check.
The problem is the consistency check fails often under high read/write contention because multiple calls to Interlocks cannot synchronize a single state (e.g., the second Interlock attempting to keep the state consistant through variabloe count is causing the problem).
To show the problem, I ran a simple test with on a four core machine using two threads to populate the queue with integers (2 x 1M) and two threads consume from the queue. I created a second TestQueue class backed by a LinkedList<t> using monitor locking.
MSQueue averages 2.8 seconds
TestQueue averages 0.54 seconds
To run MSQueue, define USEMSQUEUE. To run TestQueue, do not define USEMSQUEUE
///////////////////////////////////////////////////////////////////////
#define USEMSQUEUE
using System;
using System.Collections.Generic;
using System.Threading;
namespace lfq
{
public class MSQueue<T>
{
public class node_t
{
public T value;
public pointer_t next;
/// <summary>
/// default constructor
/// </summary>
public node_t() { }
public override
string
ToString()
{
return string.Format("node_t(value={0}):{1}", value.ToString(), next.ToString());
}
}
public struct pointer_t
{
public long count;
public node_t ptr;
/// <summary>
/// copy constructor
/// </summary>
/// <param name="p"></param>
public pointer_t(pointer_t p)
{
ptr = p.ptr;
count = p.count;
}
/// <summary>
/// constructor that allows caller to specify ptr and count
/// </summary>
/// <param name="node"></param>
/// <param name="c"></param>
public pointer_t(node_t node, long c)
{
ptr = node;
count = c;
}
public override
string
ToString()
{
return string.Format("pointer_t(count={0}, ptr={1})", count, ptr == null ? "null" : ptr.ToString());
}
}
public pointer_t Head;
public pointer_t Tail;
public MSQueue()
{
node_t node = new node_t();
Head.ptr = Tail.ptr = node;
}
/// <summary>
/// CAS
/// stands for Compare And Swap
/// Interlocked Compare and Exchange operation
/// </summary>
/// <param name="destination"></param>
/// <param name="compared"></param>
/// <param name="exchange"></param>
/// <returns></returns>
private bool CAS(ref pointer_t destination, pointer_t compared, pointer_t exchange)
{
if (compared.ptr == Interlocked.CompareExchange(ref destination.ptr, exchange.ptr, compared.ptr))
{
Interlocked.Exchange(ref destination.count, exchange.count);
return true;
}
return false;
}
public bool deque(ref T t)
{
pointer_t head;
// Keep trying until deque is done
bool bDequeNotDone = true;
while (bDequeNotDone)
{
// read head
head = Head;
// read tail
pointer_t tail = Tail;
// read next
pointer_t next = head.ptr.next;
// Are head, tail, and next consistent?
if (head.count == Head.count && head.ptr == Head.ptr)
{
// is tail falling behind
if (head.ptr == tail.ptr)
{
// is the queue empty?
if (null == next.ptr)
{
// queue is empty cannnot dequeue
return false;
}
// Tail is falling behind. try to advance it
CAS(ref Tail, tail, new pointer_t(next.ptr, tail.count + 1));
} // endif
else // No need to deal with tail
{
// read value before CAS otherwise another deque might try to free the next node
t = next.ptr.value;
// try to swing the head to the next node
if (CAS(ref Head, head, new pointer_t(next.ptr, head.count + 1)))
{
bDequeNotDone = false;
}
}
} // endif
} // endloop
// dispose of head.ptr
return true;
}
public void enqueue(T t)
{
// Allocate a new node from the free list
node_t node = new node_t();
// copy enqueued value into node
node.value = t;
// keep trying until Enqueue is done
bool bEnqueueNotDone = true;
while (bEnqueueNotDone)
{
// read Tail.ptr and Tail.count together
pointer_t tail = Tail;
// read next ptr and next count together
pointer_t next = tail.ptr.next;
// are tail and next consistent
if (tail.count == Tail.count && tail.ptr == Tail.ptr)
{
// was tail pointing to the last node?
if (null == next.ptr)
{
if (CAS(ref tail.ptr.next, next, new pointer_t(node, next.count + 1)))
{
bEnqueueNotDone = false;
} // endif
} // endif
else // tail was not pointing to last node
{
// try to swing Tail to the next node
CAS(ref Tail, tail, new pointer_t(next.ptr, tail.count + 1));
}
} // endif
} // endloop
}
}
class TestQueue<T>
{
public
void
enqueue(
T item
)
{
lock (_list)
{
_list.AddFirst(item);
}
}
public
bool
deque(
ref T item
)
{
LinkedListNode<T> last = null;
lock (_list)
{
last = _list.Last;
if (last != null)
{
item = last.Value;
_list.Remove(last);
}
}
return last != null;
}
LinkedList<T> _list = new LinkedList<T>();
}
class Program
{
const int CHECKSIZE = 64;
const int CONSUMERS = 2;
const int PRODUCERS = 2;
const int PRODUCESIZE = 1000000;
#if USEMSQUEUE
static MSQueue<int> queue = new MSQueue<int>();
#else
static TestQueue<int> queue = new TestQueue<int>();
#endif
static bool consumersDone = false;
static void Main(string[] args)
{
Thread[] consumers = new Thread[CONSUMERS];
Thread[] producers = new Thread[PRODUCERS];
System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
sw.Start();
Start(producers, producer);
Start(consumers, consumer);
Join(producers);
consumersDone = true;
Join(consumers);
sw.Stop();
Console.WriteLine("{0 .0000} seconds", sw.Elapsed.TotalSeconds);
}
static void Join(
Thread[] threads
)
{
foreach (Thread t in threads)
{
t.Join();
}
}
static void Start(
Thread[] threads,
ThreadStart method
)
{
for (int i = 0; i < threads.Length; i++)
{
threads[i] = new Thread(method);
threads[i].Start();
}
}
static void producer()
{
for (int i = 0; i < PRODUCESIZE; i++)
{
queue.enqueue(i);
}
}
static void consumer()
{
int n = 0;
for (; ; )
{
if (queue.deque(ref n))
{
}
else if (consumersDone)
{
break;
}
}
}
}
}
|
|
|
|
|
I'll look into it.
Idaho Edokpayi
|
|
|
|
|
"Lock free" algorithms are riddled with patents. Everyone considering to use them should be aware of this fact.
|
|
|
|
|
Do you know where to find info on such patents?
Idaho Edokpayi
|
|
|
|
|
|
The Wizard of Doze wrote: "Lock free" algorithms are riddled with patents.
Has someone claimed a patent on the basic pattern:- Latch old pointer to current object
- Create new version of object using information at latched pointer
- Compare-and-swap the current object pointer to the new one if it was equal to the old one
- If the CAS operation failed, repeat
Many lock-free algorithms are formed by taking long-standing existing algorithms and wrapping them in that fashion, so any patent on such algorithm would be void due to obviousness.
|
|
|
|
|
You can't patent an algorithm. It's an idea, which isn't patentable.
|
|
|
|
|
Just like other lock-free data structure found on the web, the memory allocation/de-allocation in enqueue/dequeue will perform locking during the operation. Therefore the MSQueue is not lock-free unless some lock-free memory allocation scheme is applied. An "almost" (the first few allocations will touch the global heap anyway) lock-free free list memory allocator should not be hard to implement base on the code provided in ArrayQ and MSQueue .
Happy coding.
|
|
|
|
|
I'll revise the article to note that to achieve complete "non-blocking" status the code must use non-blocking memory allocators. However, I did some research into custom memory allocation and what I found is that while it is easier to outperform early memory allocators, many newer memory allocators easily outperform all but the absolute cleverest implementations including the default memory allocator for Windows Vista. See this guy's blog He was able to remedy it. So, I suggest interested readers substitute his code if they want a truly wait free algorithm. My problem is that although the algorithm is wait-free and out performs the default allocator - does it still beat a library such as HeapLayers or Hoard? There are high performance memory allocation libraries out there that could possibly not be wait-free but faster still. In the end, we all want speed and ease of implementation. I suggest using HeapLayers or Hoard before his library and accepting one or two locks. Your code will be faster, more stable and no one will care that the architecture is not pure "wait-free".
Idaho Edokpayi
|
|
|
|
|
I think you are missing a point: Michael and Scott's algorithm, as expressed in pseudocode, is wait-free; however, it's many implementations in "real" language are not wait-free.
The problem here is that the new_node() (with an underscore '_') is often incorrectly replaced with new node() (with a space ' ').
Here is what Michael and Scott write in their paper (part 2, second paragraph):
We use Treiber’s simple and efficient non-blocking stack algorithm [21] to implement a non-blocking free list. However, the pseudocode does not mention free lists except in the comments to line E1: # Allocate a new node from the free list . I think this is the key - the algorithm needs a pre-allocated free list implemented as Treiber’s stack in order to be wait-free.
|
|
|
|
|
Just like other lock-free data structure found on the web, the memory allocation/de-allocation in enqueue/dequeue will perform locking during the operation.
Does the .net allocator used when creating a new object use locking rather than spinlocking? If so, does it contain measures to prevent priority inversion and other such nastiness?
|
|
|
|
|
The download only has the C++ version.
|
|
|
|
|
|
your welcome!
Idaho Edokpayi
|
|
|
|
|
Since you have provided no "use cases", I have a question.
Is the value of this queue that multiple threads may access it without fear of data corruption?
Thanks for your work.
|
|
|
|
|
I am sorry. I'll revise the article to make the use of lock free queues more obvious. I hinted at it in several place but never explicitly stated that yes this queue is valuable because it may be accessed from multiple thread safely without need for synchronization.
Idaho Edokpayi
|
|
|
|
|
Is it read AND write multi-thread safe, or just read? I ask, because I recall that some of the STL implementations have containers that are multi-thread safe for read access but not writes (I'm thinking specifically of STLPort).
Also, have you considered enhancing the implementation such that you can use it with STL iterators/containers and the various algorithms in STL? That would be pretty cool IMHO
¡El diablo está en mis pantalones! ¡Mire, mire!
Real Mentats use only 100% pure, unfooled around with Sapho Juice(tm)!
SELECT * FROM User WHERE Clue > 0
0 rows returned
Save an Orange - Use the VCF!
VCF Blog
|
|
|
|
|
It should be read and multithread safe. I use the word "should" because it would take more time to test than I have time. My code is a very faithful implementation of the algorithm from the Michael/Scott paper so I would put my certainty at 95%. Give me a code example of what you want to do and perhaps I could expand on the code.
Idaho Edokpayi
|
|
|
|
|
Think about what you are saying. If all you are doing is reading, then you don't need any locks!!! But, at the beginning, everything needs to be allocated, so providing you setup all you data and structures before you fire up your additional threads then everything is read safe. And take down all your "read" threads beofre detroying data.
But then if you are doing a mix of reading and writing then you need locking, even if there is a heavy bias on one or the other.
If you are doing 99% reading, and 1% writing, then you do need locking, and something like perhaps a ReadWriteLock, with a preference for the Writer.
Giles
|
|
|
|