|
The code is fragile and not safe.
|
|
|
|
|
|
Our program using MSQueue usually crash at "t= next.ptr->value" , do you know why.
does MSQueue support multiple read and multiple write ?
Thanks very much!
|
|
|
|
|
This is extremely fragile code that will blow up if it is ever used with actual threads.
The author seems to have misunderstood critical sections (and basically every single concept in multithreading) as if they gave guarantee that no other thread could run while one thread was in a critical section. That's wrong; critical sections are mutexes. Creating one in a local variable and entering it does exactly nothing.
The enqueue() method wrongly implements CAS and can resize the array, at which point first the claimed size is incremented (while the array is not yet), then data is memcpyd into a new array, then current memory gets deleted (!), then the new array is assigned.
Thus a call to enqueue() does one of these things:
- Correctly enqueue the item
- Overwrite other memory due to an out-of-bounds array access
- Loose the item because another thread overwrites it
- Cause an access violation when accessing a deleted pointer
- Create a gap in the queue where an undefined will remain
The MSQueue is broken as well, assuming that two sequential Interlocked...() calls are equivalent to one such call, reading fields without fences or atomics and will leak memory when deleted.
|
|
|
|
|
Buggy code, it don't even implement the algorithm correctly.
|
|
|
|
|
i have fixed it.
|
|
|
|
|
|
Not working, not bug-free; a sad attempt to implement a poorly understood paper. The article claims to present a "Lock Free Queue implementation" when in fact it doesn't.
|
|
|
|
|
Why "not working, not bug-free"?
I test the C++ code and don't find any bugs. But I tune to perfomace...
|
|
|
|
|
|
This is very old algorithm IT DOES NOT WORK for multiple readers, it is not likely to work for multiple writers. It reliably works only for one reader one writer. just taking the article and compiling it is not a big deal. The author did not even bother to make it work for WIN64. Sloppy work.
|
|
|
|
|
bool dequeue(T& t)
{
...
else
{
t = next.ptr->value;
if(CAS(Head,head, pointer_t(next.ptr,head.count+1) ) )
{
bDequeNotDone = false;
}
}
...
}
every time my code will crash at the bold line
next is my code
#include "stdafx.h"
#include <iostream>
#include "LockFreeQ.h"
using namespace std;
MSQueue< int > Q;
int nReadThreads = 10;
int nWriteThreads = 10;
DWORD WINAPI ReadThreadFunc(LPVOID p)
{
while (true)
{
int i;
Q.dequeue(i);
}
}
DWORD WINAPI WriteThreadFunc(LPVOID p)
{
int i=0;
while (true)
{
Q.enqueue(i++);
Sleep(1);
}
}
int _tmain(int argc, _TCHAR* argv[])
{
for (int i=0; i< nWriteThreads; i++) {
CreateThread(NULL, 0, WriteThreadFunc, NULL, 0, NULL);
}
for (int i=0; i< nReadThreads; i++) {
CreateThread(NULL, 0, ReadThreadFunc, NULL, 0, NULL);
}
int n;
std::cin >> n;
return 0;
}
|
|
|
|
|
Hi there,
The Assert fails in the following test:
class LockFreeQueueTest
{
static MSQueue<int> queue = new MSQueue<int>();
static private void producer( object _count )
{
int count = (int) _count;
for( int i = 0; i < count; i++ )
{
queue.enqueue( i );
}
}
static private void consumer( object _count )
{
int count = (int) _count;
for( int i = 0; i < count; i++ )
{
int result = 0;
if( !queue.deque( ref result ) )
continue;
Trace.Assert( result == i );
}
}
public static void test( int count )
{
Thread producer = new Thread( new ParameterizedThreadStart( LockFreeQueueTest.producer ) );
Thread consumer = new Thread( new ParameterizedThreadStart( LockFreeQueueTest.consumer ) );
producer.Start( count );
consumer.Start( count );
producer.Join();
consumer.Join();
}
}
</int></int>
|
|
|
|
|
Hi,
Following is a simple stress test that crashes after a few seconds. Am I missing something here?
#include "stdafx.h"
#include "LockFreeQ.h"
#include <iostream>
using namespace std;
MSQueue< int > Q;
int nReadThreads = 1;
int nWriteThreads = 1;
DWORD WINAPI ReadThreadFunc(LPVOID p)
{
while (true)
{
int i;
Q.dequeue(i);
}
}
DWORD WINAPI WriteThreadFunc(LPVOID p)
{
int i=0;
while (true)
{
Q.enqueue(i++);
}
}
int _tmain(int argc, _TCHAR* argv[])
{
for (int i=0; i<nwritethreads;> {
CreateThread(NULL, 0, WriteThreadFunc, NULL, 0, NULL);
}
for (int i=0; i<nreadthreads;> {
CreateThread(NULL, 0, ReadThreadFunc, NULL, 0, NULL);
}
int n;
std::cin >> n;
return 0;
}
Hagai.
|
|
|
|
|
Hi!
IMHO this stress for manager memory: crash in 'new' command.
For function WriteThreadFunc need next change:
DWORD WINAPI WriteThreadFunc(LPVOID p)
{
int i=0;
while (true)
{
Q.enqueue(i++);
Sleep(1);
}
|
|
|
|
|
waht musst i to change to make this code comapibel with .Net 1.1
i mean with out generics
thank you
nidal
|
|
|
|
|
Regarding the Michael And Scott lock free queue:
Separating the CAS into separate operations is definitely a bug. In Scott's sample implementation, he shows a CAS function written in assembly that uses a single word compare/swap. In his proof-of-concept code he mocks memory allocation by using a short as his pointer and another short as his counter. His CAS algorithm is implemented as an sc (store/conditional) MIPS instruction written as inlined assembly. This is necessary because the allocator can (and will) recycle memory addresses. If you could assume that the allocator, for the life of the entire queue, will never recycle and address then the pointer tag is not necessary. Realistically, that is not practical. So the pointers are tagged to ensure that if one node was dequeued (and possibly re-enqueued with the same memory address) that it will skip that node. With this implementation, the ABA problem could still occur. Because it's possible to atomically write the pointer to memory, but then its tag lags behind by a few instructions. Not to mention the copying of the objects from scope to scope, also can be potentially racy.
What you need to do is use a double compare/swap which is available on Vista and Xbox360 as InterlockedCompareExchange64. On Xbox360 it's implemented using locked reads/writes and on 32 bit windows is implemented as cmpxchg8 (yes, the infamous F00F bug instruction). The downside to using the InterlockedCompareExchange64 is that you must cast the pointer/tag pair to a LONG64 which is technically undefined behavior. This can cause a failure if the memory isn't aligned on an 8 byte boundary, which is the natural alignment requirement for a long64. I've used a struct pair and a union of a long64, and that seems to do the trick. To rectify the alignment, you'll need to use __declspec( align(8) ) if you don't use a union to ensure that the compiler will align the pair of values properly. Also, if InterlockedCompareExchange64 isn't available, then you've gotta roll your own in assembly. If you do that, it's rather simple but be sure to follow the compiler's ABI.
It is important to also note that alignment is important. On most architectures (such as the PPC based Xbox360) the processor will set an exception flag when unaligned memory is accessed and the process causes a bus error, or in my case the console just locks up, the game quits and everybody stops having fun. On Intel architecture, it's a little different. Intel processors will set the exception flag, but then the exception is handled by replacing the single read/write instructions in the pipeline with a series of instructions to correct the erroneous bus transaction. Once that happens, the cmpxchg8 instruction is no longer atomic and we're still left with a race condition.
modified on Monday, April 7, 2008 9:10 PM
|
|
|
|
|
Thanks for the comment! At some point, I'll take the time to correct my article. I appreciate your reading and correcting me.
Idaho Edokpayi
|
|
|
|
|
Hello,
I am still not sure that the following code is really safe:
pointer_t(const pointer_t* p): ptr(NULL),count(0)
{
if(NULL == p)
return;
InterlockedExchange(&count,const_cast< LONG >(p->count));
InterlockedExchangePointer(ptr,const_cast< node_t*>(p->ptr));
}
Imagine, your source pointer being completely modified between both Interlocked lines. The count you have is not consistent with the current one.
then when in your code you have:
...
// Read Tail.ptr and Tail.count together
pointer_t tail(Tail);
...
You may be interrupted between the interlockedExchange of your copy constructor. As Tail is a reference, you may end with tail (the copy) having the count of the tail at moment T0 and the pointer of the Tail at moment T1 (different moments, possibly different pointer_t ...
If the Tails have different count => it will be detected but... If no luck both Tail pointers at various moments may have the same count.
I think after that the queue might be messy.
I have come accross a InterlockedExchange64 function that might help to solve this issue. Concatenate both pointer and counter in a 64 bit value, and access it atomically!
Hopefully InterlockedExchange64 is lockfree (I am not sure).
Finally, I may have not gone deep enough to find out my assumptions are false, and that this implementation is correct.
Thanks for the work and for any answer.
Sincerely,
Didier HARRANG
|
|
|
|
|
You could be right. But what of a 64 bit implementation?
I think I prefer taking a copy early in the constructor then only making the exchange if the pointer is unchanged. Thanks for the commment!
Idaho Edokpayi
|
|
|
|
|
The problem is that what is pointed to can change, but the address does not. Assume Q has a few nodea already then
pop invoke(thread A)//gets node pointer X, but suspends before CAS invoked
pop invoke(B)
pop return (B)//node pointer X deleted
push enter (B)//node pointer X' allocated (allocator uses exact same address as X -- why not?)
push return (B)//X' now
B sleeps
A wakes up, sees that X'==X returns X'
and FIFO is not satisfied.
The tags (counter) and the poitners have to be compared and swapped at the same time. Typically double word CAS is used, but there are schemes to pack the tags in the low order bytes of the pointer values too. 128 bit CAS is not part of the MSVC intrinsics, but that doesnt mean that a) it isnt supported by hardware, so you could write assenmbly to get at it, and b) that incorrect code must be written.
A small note: if your MSQueue is deleted before all items have been popped, you will have a memory leak. the Dtor should something like T v; while(deque(v)); in it to drain off all the items, if there are any.
The reason for selecting different allocators isn't strictly performance. Allocators can align the nodes on cache line boundaries, substantially reducing false sharing (thus contention), it can make room for those tag values, and also since the nodes are all the same size, making a lock free free list allocator is very easy. Even a system allocator with fast locks is subject to convoying, priority inversion, and other nasty things.
Lance
|
|
|
|
|
The article mentions the research in a 1996 paper by Maged M. Michael and Michael L. Scott on non-blocking and blocking concurrent queue algorithms, and states that ...
In fact, their queue implementation is now part of the Java concurrency library.
Where are you getting this fact from? It frankly does not appear defensible. The principal author/architect of the Java concurency library is Doug Lea at Oswego (see http://g.oswego.edu/dl/classes/EDU/oswego/cs/dl/util/concurrent/intro.html[^]) not either of Maged Michael or Michael Scott.
It's true that Micheal Scott won the 2006 "Edsger W. Dijkstra Prize in Distributed Computing", for work he did in a 1991 paper which describes a mutual exclusion lock based on a queue. The 1991 paper is different from the 1996 paper that you cited. It's also true that his mutual exclusion lock forms the basis for monitor locks used in Java VMs. See http://www.podc.org/dijkstra/2006.html[^].
But a mutual exclusion lock is very clearly a locking technique, not a lock-free technique. And the paper you cited is not the one that forms the basis for monitor locks in Java.
So, I don't understand the statement in the article, to the effect that the 1996 paper is somehow the basis for the entirety of the Java concurrency library.
Please explain further.
PS: Professor Scott is a remarkable person with many achievements, which you can see from his University of Rochester website: http://www.cs.rochester.edu/~scott/[^]. With my question above, I intend no disrespect for him or any of his accomplishments. I simply seek justification for a fairly sweeping comment.
|
|
|
|
|
I cannot find my original research which lead me to believe that they themselves wrote the queue implementation in the concurrency library. I've been busy since then. However this is what the concurrency library page said: "This implementation employs an efficient "wait-free" algorithm based on one described in Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms by Maged M. Michael and Michael L. Scott." Re-reading what I said, I do not understand how you would conclude that I was saying the paper was the basis for the entirety of the Java concurrency library. Your quote of what I said explains pretty clearly what I believed "..their queue implementation is now part of the Java concurrency library.:
Idaho Edokpayi
|
|
|
|
|
In the ArrayQ::Resize() function (C++ file), what is the purpose of the CRITICAL_SECTION?
It's being created on the stack, inside the local scope of the Resize() function. Thus, every single thread that calls the function will get its own copy of a different critical section. As such, it cannot serve any possible thread-synchronization or data-protection functions.
Can you shed any light on this?
Mike
|
|
|
|
|
I've since revised the code and removed the ArrayQ class since I believed the queue based on the paper was a stronger and more defensible implementation. I actually don't recommend the ArrayQ class. I am also planning a fix for the performance problem in the C# queue.
Idaho Edokpayi
|
|
|
|
|