Even without using volatile keyword also am getting same results. I tried in release mode but doesnt make any difference. Below is my code. please clarify
class Program
{
private bool loopFlag = true;
public static void ExecuteLoop(object o1)
{
while (((Program)o1).loopFlag)
Console.WriteLine("Executing Loop");
}
static void Main(string[] args)
{
Program pgm = new Program();
Thread t1 = new Thread(ExecuteLoop);
t1.Start(pgm);
Thread.Sleep(1000);
pgm.loopFlag = false;
Console.WriteLine("Value set to false");
Thanks Shiv for writing this article and giving way for informative related discussions in comments
"But one fine day my love with c# thread’s had a bump"
Today was that same day for myself. I have managed the scenario with volatile keyword.
Please let me know, if you have found any better solution since you wrote this article.
without volatile it is working fine or else i am missing something. I have copied and pasted your code as it is which is without volatile but it is not working as you have shown the output. "Loop Stopping..." message is also getting printed whereas i have not put volatile in front of _loop variable.
Nicely Explained !! I am new to Threading, recently i came across a statement containing volatile keyword in my project. Now i understand its intention of usage.
Your description of main vs. thread memory is plain wrong - it has todo with the processors memory caches (multicore) - in the end volatile "works" for you but for the wrong reason - your codewise sync/threading is done wrong - If I love someone(-thing) for years I'd know it better... but I voted 2 for funny writing style.
This article is misleading, I can't imagine why people could rate it as a 5.
Yes, is well written, but is totally wrong. you clearly have no idea about multithreading, synchronization, cache, etc.
Sorry but I can't let a mistake like this pass through.
You are assuming the code is not working for the wrong reason.
the 'undeterministic' behavior is due to cache being applied to the field.
the compiler might optimize the use of that field for single threads, the value then, might be updated with certain 'delay'.
volatile means the field will not be cached in a per-thread basis, meaning all threads in all CPUs will be able to see changes in the field instantly. Hence disabling compiler optimizations for that field.
there is no bug here. there is nothing to fix. the fields where not "out of synch", the values were cached.
using a word like "synched" in this context is not correct, and is indeed misleading. Also is assuming the difference between running in debug mode and release mode. You might also run the program hundreds of times with different optimization enabled, and you will get different results.
I have rated most of your articles with a 5, but this one it is not ok.
The fact that other people have gave you a 5 on this, is at least, concerning.
You are assuming the code is not working for the wrong reason.
the 'undeterministic' behavior is due to cache being applied to the field.
the compiler might optimize the use of that field for single threads, the value then, might be updated with certain 'delay'.
volatile means the field will not be cached in a per-thread basis, meaning all threads in all CPUs will be able to see changes in the field instantly. Hence disabling compiler optimizations for that field.
I get your point and you are right. If we use volatile keyword we are fiddling with the free and optimized decisions made by the processor for memory synch. Now the processor has to do extra work of flushing memory of local thread storage as well as main memory. But the question is so do we put a delay for this synch to happen and will that delay be optimized.
Leonardo Paneque wrote:
there is no bug here. there is nothing to fix. the fields where not "out of synch", the values were cached.
There is no bug from the processor perspective , but my threads want current data which can lead to undeter behavior in my application. At the last I have demonstrated in a video about a bug caused due to volatile keyword. The program keeps running forever as its not getting the recent fresh value. So as from processor optimization perspective they are perfect but then from app perspective things can be problematic.
Leonardo Paneque wrote:
using a word like "synched" in this context is not correct, and is indeed misleading. Also is assuming the difference between running in debug mode and release mode. You might also run the program hundreds of times with different optimization enabled, and you will get different results.
As we have two memory storages i have used the word synch. Not sure if the word is misleading. With debug mode your program runs bit slow , giving chance for the processor to flush the memory , so i have said to run in released mode so that we can see the issues. I agree with different processors ( as every one has different memory model) you can get different results. Atleast with my all my PC , i was able to reproduce the issue.
Leonardo Paneque wrote:
I have rated most of your articles with a 5, but this one it is not ok.
The fact that other people have gave you a 5 on this, is at least, concerning.
I do not write for votes , i carv for such discussions which will help me grow. This discussion is worth than 100 votes .
Said and done you have highlighted very important points , i will be updating with a note about performance issues with volatile.
Cross question :- Is there some better way , that all my threads see fresh memory and we do not loose too much in optimization ?.
ON x86 and x64 there is no performance issues, most people agree the impact is zero. but that's due to CPU architecture, on ARMs processors it might be a problem, but not because volatile itself, but because the way cache is flushed. (everything is flushed)
Is there some better way , that all my threads see fresh memory and we do not loose too much in optimization ?.
Yes, to address that question... But pretty much to upgrade your knowledge on threading, I want to clarify this for you and anybody to come.
As for your relationship with threading: First you don't know volatile, when the right problem arises you will discover it, and you will pass a time when you like volatile because it solves the immediate problem. And then you will start to be frustrated by its limitations. You may start to like lock again... And finally you start avoiding both lock and volatile it as if they were a curse.
I'll try make this phases go fast and with little pain.
1) Volatile solve a problem.
It allows you to tell the compiler that certain variable must be taken in special consideration because it is being used by various threads at the same time. In fact, volatile is excellent to tell the compiler that it should make it so that those threads will read the value of the variable from shared memory each time (and not from cache).
2) Volatile is misleading.
Let's say you have split an operation in various threads, and you want to report the total progress in real time.
Of course you will have a shared variable for that. Let's say you have an int, that will go from 0 to 1000 where 1000 means that the works is done, and each thread has half the work (that is, they will increase the variable 500 times).
What can we expect? Well, that after both thread finished the final value will be 10000. Well, no. Because these threads are caching the variable. Your first instinct may be to throw volatile to the cage and hope for the best, and it will work most of the time... but not quite, sometimes you get 9999, some others 9997 and so on. What happens is that each thread will have something like this:
C#
_progress++
How does the compiler handle this? Like so, of course:
C#
var tmp = _progress + 1;
_progress = tmp;
Do you see the problem? No? Ok, let me try again:
C#
var tmp = _progress + 1;
//They are not a single instructions
_progress = tmp;
"What's wrong with that?" - you may say - "It should work anyway". Well, the problem is that you have two threads, and those threads can be preempted anytime, perhaps...
C#
var tmp = _progress + 1;
//Just right here!!!!!
_progress = tmp;
And now, each thread has it's version of the new value to set the field, and both are gonna write! oh no!
The initial value is 0.
The thread A comes and reads 0, and says... I'll update it to 1.
The thread A get's preempted.
The thread B comes and reads 0, and says... I'll update it to 1.
The thread B writes the field to 1.
The thread A is awaken.
The thread A writes the field to 1.
Both threads will swear they have just updated the value of the field. That is, from their view point the field had the value 0 and now it has 1. Yet, the final value is 1 not 2.
3) Love and hate of lock
Oh, cruel threads, what are we gonna do? of couse one solution is use Monitor:
C#
lock (something)
{
_progress++;
}
Yep, that works... but if we are going to need a lock, why are we using volatile in the first place?
Monitor is the defecto synchronization mechanism in C#. If you want to make sure only one thread enters a critical section, then use Monitor. It just works. And that's the problem.
Sometimes you don't want to have just one thread, for example you may want to have multiple threads read, and only one write (at a time). That's what ReaderWriterLock(Slim) is for, except for some corner cases, which thankfully have already been addressed more recently with Lazy<t> and ThreadLocal<t>.
But what if I want multiple writes?
Yes, I want multiple writers, there is no energy or time to have a threads sitting around! What I want is Lock-Free systems!
For example, let's get back to our previous example. Let's say that the work these threads are doing is copying an array, they are cooperating to copy the array. Of course if they are going to cooperate to copy the array, they need to be able to write it simultaneously.
C#
//Two threads use this code to cooperate to copy the arrayvar index = 0;
do
{
index = _progress++;
arrayTarget[index] = arraySource[index]
}while(index < 1000)
Of course, this time the problem is clear as day. Because of that [insert insult here] increment they may end up copying parts of the array twice!
So, how about a lock?
C#
//Two threads use this code to cooperate to copy the arrayvar index;
do
{
lock(something)
{
index = _progress++;
}
arrayTarget[index] = arraySource[index]
}while(index < 1000)
Well, it is no good either, because now each iteration the threads has to compete for the lock. ReaderWriteLock is not going to help either, because they are all writers. If those are the only options, then it would be better to have a single thread do the job.
So, how do you solve this?
4) Avoiding volatile
Here is the truly unsung hero: Interlocked
What you need is to increment _progress in a single (atomic) operation (in such a way that the thread will not be preempted in the middle of it), and Interlocked allows you to do just that!
C#
//Two threads use this code to cooperate to copy the arrayvar index;
do
{
index = Interlocked.Increment(ref _progress);
arrayTarget[index] = arraySource[index]
}while(index < 1000)
Yay! A solution. There is just a problem... You need to stop using volatile to be able to use Interlocked. Why? Well, Interlocked needs a ref to the field, and volatile doesn't work with that.
This bring another question... If I need to use both Interlocked and Volatile what should I do?
There are two answers:
A) Use Thread.VolatileRead(ref v) and Thead.VolatileWrite(ref v, value) which have the same semantics that volatile had in the first place (read the most up to date value, flushing the cache).
B) Use Thread.MemoryBarrier() which is what volatile actually does (disable compiler optimization from reordering operations and caching values). Actually Thread.VolatileWrite and Thread.VolatileRead use Thread.MemoryBarrier
In the situation where the industry stands today, you are going to see less and less lock and volatile and more and more Interlocked. This move will be driven for the desire for better CPU utilization and better energy consumption. There is a onward moving push to develop more and better Lock-Free and also Wait-Free* data-structures.
*: There are many self-proclaimed wait-free data-structures that are only lock-free after scrutiny (marketing?), only a few actually are what they say. Creating a truly wait-free data-structure is a very hard task (for which I'll give you some tips below). Lock-free on the other hand is a perfectly reachable objective.
I guess eventually most cases will be covered, for now, we need to spread the knowledge because we don't know who is going to complete the holy task of creating a truly mutable wait-free dictionary that doesn't have a limited capacity of threads or items (and releasing it as free and open source software) [I don't even know if such data-structure is possible, but I'll be looking for it (does it already exists?). Really that person deserves a medal or something].
5) Extra
These are tips to create wait-free and lock-free data-structures.
a) The key to creating lock-free and possibly wait-free data-structures is to allow threads to cooperate (with few exceptions).
Sometimes you will need to have some kind of protocol that allows threads to decide how to help others, so if a thread cannot do something it doesn't have to wait, instead it goes to help another thread*.
*: Hence the term wait-free.
Eventually another thread could come and help the first thread with the task he could not do before, or that thread come back and try again after helping another thread.
b) To create such protocol that allows threads to decide how to help others, it is important to be able to identify threads uniquely. Sadly the ManagedThreadId could be recycled by the runtime. In this case having a thread local storage and Interlocked comes in handy to assign, either unique temporal IDs, work tickets or other mechanisms to facilitate the protocol.
c) You can use thread local storage since .NET 2.0 with LocalDataStoreSlot. Also if you store something in the stack (stackallock?) it will be thread local because the stack is thread local.
d) Interlocked.Increment will wrap values. I mean after reaching MAXVALUE it goes back to MINVALUE. This allows you to create a wait free circular stacks and queues, under the constraint that the capacity should be a divisor of MAXVALUE (ie, a power of two, so you don't have to wrap the indexes, just use mod to map them to an internal array).
e) Using Interlocked.CompareExchange, Interlocked.Increment and Thread.VolatileWrite you can have critical sections that will only be acceded a limited number of times (for example only once, good for lazy initialization). Or, you can use them to allow only one thread a at a time without making other threads wait (similar to Monitor.TryEnter but much more lightweight).
f) Did you create a data-structure that is misbehaving and you don't know what may be going wrong? Use that wait-free circular something from point 4 to log debug information without introducing locks.
g) Homework: Learn about Incremental resizing. Hint: yes, you will have various thread cooperate to copy an array
Did you manage to implement your data structure without adding artificial delays?
no spin locks?
no busy whiles?
not a nasty goto?
enumeration doesn't repeat items?
no items lost when resizing?
no items lost in colision?
no mysterous hang?
no [I don't know what]?
Then congratulations! you just created a wait-free data-structure [Under the assumption that it works... test, test, test].
Otherwise, that is pretty much where I am... I don't know what the next tip may be.
Final note: I wish there were a button to convert this message into an article. This is why I cannot maintain a blog: because if I sit in advance to write for a blog I don't know what to write, but then I sit to answer question in some random thread, and there you go, a post that could be in a blog.
You are absolutely right. Using the volatile keyword in the sample code, is a substitute for thinking. The code may perform as intended, but then again it might not.
The CLR framework does give you a nice and clean interface to threading, but the fundamental issues regarding concurrency is not resolved by thowing in "volatile".
- oggenok
Last Visit: 31-Dec-99 18:00 Last Update: 21-Sep-24 4:30