|
George_George wrote: we can see that if we do not add volatile to the thread shared variables, the checking thread may deadlock -- I think it is what MSDN sample teaches us -- no volatile means no reliable and secure code.
I read that sample as saying that not using volatile means the code will not execute correctly since the compiler may mangle the code so that the variable does not operate as a mutex. Whether the code deadlocks or allows simultaneous access to the CriticalData variable depends on how the compiler optimizes the code.
George_George wrote: But in the past the in many multi-threaded, I do not see they use volatile (they are using other approches like mutex). So, I am confused whether the their code is,
1. unsafe, and need to add volatile;
2. or safe, and mutex is an alternative way to tell compiler do not optimize (this is what I mean alternative).
Most code doesn't use a combination of a volatile variable plus the Sleep to implement a mutex. Note that just using volatile does not turn the variable into a mutex - you must have the Sleep . Why go to that trouble when the OS has a mutex built in that does all that overhead stuff for you? The usual and "approved" approach is to use the functionality provided to you instead of rolling your own, unless you have a compelling reason to override the existing functionality. It's much safer to use the various sorts of synchronization mechanisms that all OSes provide rather than writing your own and possible not doing so correctly.
Use of a mutex, per se, does not tell the compiler not to optimize. The fact that you are making a function call limits the optimizer in what it can do to the code surrounding the function call.
Judy
|
|
|
|
|
Thanks Judy,
I agree and understand all of your points. Except the below one. My confusion is, I can not think of the other ways the compiler could optimize the code, other than making a deadlock.
What other ways compiler may optimize the code? Could you show some more ideas please?
JudyL_FL wrote: I read that sample as saying that not using volatile means the code will not execute correctly since the compiler may mangle the code so that the variable does not operate as a mutex. Whether the code deadlocks or allows simultaneous access to the CriticalData variable depends on how the compiler optimizes the code.
regards,
George
|
|
|
|
|
Depends on what the compiler thinks the value of Sentinel is when it encounters it, while doing its optimizing run. If it thinks it is true, it would optimize so that it deadlocks. If it thinks it is false, it would optimize so that it allows multiple access. Without knowing the details of how the optimizer works, it could be either one. It may see the value as true if it does all its global variable initializations at the start of its processing and doesn't change them. It may see the value as false if it notes the assignment to false (and only false) in the second thread during its first pass and then does the optimizations in its second pass. It all depends on how the compiler is implemented. I admit the "optimize away" case is reaching a bit but you never know. Take a look at the assembly language generated by an optimizing compiler sometime; it does some mighty weird stuff sometimes. I wouldn't put anything past it.
Judy
|
|
|
|
|
Thanks for your advice, Judy!
regards,
George
|
|
|
|
|
Thanks Maxwell,
Let us just omit this keyword. We can not find a more specific case when this keyword is mandatory needed.
regards,
George
|
|
|
|
|
George_George wrote: 2. If we share data between threads and we are already using synchronization approach like mutex, do we need to add additional volatile keyword?
The volatile keyword is not so much for synchronization as it is for telling the compiler that it needs to evaluate the variable at every reference (because it could be modified in the program by something other than statements, such as the operating system, the hardware, or a concurrently executing thread) rather than optimize them away.
"Normal is getting dressed in clothes that you buy for work and driving through traffic in a car that you are still paying for, in order to get to the job you need to pay for the clothes and the car and the house you leave vacant all day so you can afford to live in it." - Ellen Goodman
"To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne
|
|
|
|
|
Thanks DavidCrow,
Could you show a more specific example please (better a sample which you experienced)? I am interested in this since in my experience I have not found any case which behaves differently with/without volatile.
regards,
George
|
|
|
|
|
You are trying to udnerstand everything at once - I am impressed!
George_George wrote: I have tried to remove the keyword volatile, and the result is the same
Here's an important question for programming:
Even if it (seems to) work, it may be broken.
"broken" in this case means: it may fail as soon as you move to another PC, another compiler, another version of the runtime library, or the problem may occur just once in 10.000 hours. Since compilers usually don't guarantee "same binaries for same source", it might even fail after recompiling.
Especially in C++, there are many constructs that may work right here right now, and the code your compiler for your platform generates is correct. Still, the source code is wrong (and a maintenance time bomb)
For the MSDN sample: omitting the volatile keyword, the code may fail on multicore machines with separate caches that are not necessarily coherent (wikipedia[^])
|
|
|
|
|
Thanks peterchen,
I can understand your points and I agree. Here is my previous comments and I think the optimization provided by compiler is wrong -- causing wrong function in a multi-threaded environment. Do you have any comments?
--------------------
Now I strongly suspect whether compiler will generate any wrong code -- functional wrong code. In MSDN sample, variable Sentinel is used to act as a shared variable between thread1 and thread2. Compiler should guarantee that both threads can read/write the correct value of Sentinel.
It seems that volatile will make wrong optimization to prevent thread1 from reading the most recent correct value set by thread2? I think it will bring high risks to careless developers, who does not know about volatile and forget to put it ahead of the variable, which will result in the wrong optimization of compiler.
--------------------
regards,
George
|
|
|
|
|
George_George wrote: and I think the optimization provided by compiler is wrong -- causing wrong function in a multi-threaded environment. Do you have any comments?
The optimization is not wrong. The C++ Standard says the compiler may do that*), so you have to live with this.
C++ as a language does not understand anything about threads. So for the compiler, changing Sentinel in a separate thread is like changing it secretly behind its back. volatile is the warning to the compiler "be careful! this value may change behind your back!"
A bug in your source code that appears only under (agressive) optimizations is actually quite frequent. There are a lot of subtle pitfalls in the C++ standard that code that looks good and works well in an unoptimized build breaks under the otimizer. (my recent post in the "Subtle Bugs" forum is a good example).
You have to differentiate three things: What you think is correct, what your compiler should or may do according to the standard, and what your compiler actually does. In most cases, you are wrong - not the compiler
I've seen maybe hundreds of posts of people claiming to "have found a compiler bug", virtually always it was a bug in their source code, due to ignorance or misunderstanding of the C++ Language Standard.
George_George wrote: I think it will bring high risks to careless developers, who does not know about volatile and forget to put it ahead of the variable, which will result in the wrong optimization of compiler
Absolutely. It would be good if C++ had better sematincs for multithreaded applications - but it doesn't have. C++ is not a simple language - not because of its syntax, but because of things like these. Multithreading is naturally not simple. C++ is not a language for careless programmers (and I think there is no language dedicated to them)
*) rather, it doesn't say the compiler may not do that...
[edit] I want to add that right now, no compiler implements the C++ standard perfectly - so there is always a discrepancy between what the standard says and the compiler does. Still, with most "errors", it is not the compilers fault. But too, just because your compiler accepts it doesn't mean it's legal.
Comeau[^] is known as most compliant, they also have a web applet where you can test short code snippets.
modified on Friday, December 28, 2007 5:28:27 PM
|
|
|
|
|
peterchen wrote: C++ is not a language for careless programmers (and I think there is no language dedicated to them)
There is!! ...... the "Plain English Compiler".
Maxwell Chen
|
|
|
|
|
I refrain from commenting on that
|
|
|
|
|
peterchen wrote: I refrain from commenting on that
Maxwell Chen
|
|
|
|
|
Thanks peterchen,
1.
I think you mean in the MSDN sample, the change of variable *Sentinel* by thread2 may not be noticed by thread1. So, thread1 may deadlock? I can vaguely understand... but if my understanding is correct, could you write down the code which is the *optimized* code by compiler please (when without volatile keyword)?
2.
I think you key points are, optimization from compiler's perspective is correct. Since compiler know nothing about multi-threading? Right?
regards,
George
|
|
|
|
|
2. Almost, yes
My key point is: The optumization is correct, and knowing this is a vital part of being a good programmer.
(and that's a very important thing to learn)
1. What follows is a technical explanation, that is possibly valid for your compiler and your platform. The same behavior could be caused by other effects on your platform. Other compilers may react differently and may or may not expose similar results.
The compiler sees the following loop:
while(Sentinel==0)
Sleep(0);
The optimizer loads the contents of the memory location to a CPU register. Since it sees no location where Sentinel changes (and doesn't need the CPU register for something else). In each loop iteration, it keeps on checking the CPU register, thinking "no, Sentinel can't have changed, and I have the value still in a register, so I check that register". But the register value never changes.
A similar effect can occur with dual core CPU's where each core has it's own CPU cache - the compiler has to emit special code to tell the CPU "..and make sure you synchronize all caches"
For an interesting read how complex these things can get, try the The "Double Checked Locking Pattern is Broken" declaration[^]
|
|
|
|
|
Thanks for your great reply peterchen,
I do not agree how compiler optimize the loop is correct. For example,
while(Sentinel==0)
Sleep(0);
I think the way compiler optimizes the while loop (or other similar loop) is based on the *fact* that the control condition of a loop will not be changed by another thread -- this *fact* is incorrect.
The common way in engineering to design whether or not a thread needs to be terminated is to let the thread in a loop and continues to check a bool variable called stopped. Actually, I do not see many people add volatile keyword to the stopped bool variable.
Any comments?
have a good weekend,
George
|
|
|
|
|
George_George wrote: Actually, I do not see many people add volatile keyword to the stopped bool variable.
You've come very far in what seems a short time, George
Maybe I should distinguish two things here: From what you (and I) would like to have, the compiler is doing something wrong, agreed. But from the view of the language definition, the compiler is absolutely correct.
My point of this post: The compiler cannot help you as much as you now think it could.
There are several reasons:
Holding frequently used values in a register is very important for optimizations - otherwise the compiler could generate neither fast nor compact code.
If the compiler would blindly assume that "some thread may modify values", the following variables could nopt be held in registers:
* all variables not declared in the local function block
* all variables of which a reference / addres is passed to some function
That would pretty much kill many optimizations that are essential to efficient object-oriented code.
Furthermore, all access to these variables would require to force synchronization between core caches, pretty much killing the advantage of multiple cores.
So you can either opt for the compiler to make things easy for you, or the compiler giving you full control. C++ almost always opts for "full control, but you need to know what you are doing". (That's why C++ is usually compared to a ferrari)
Keep in mind that, even above loop can be exited legally. Sleep(0) might throw an exception when it is finished. It might terminate the program using exit(). Generally, the compiler doesn't know the source code of Sleep.
It isn't as bad as you think. You can split data access from separate threads into the following two cases:
access to complex data
that's most structs and classes. In this case, you need to use a lock (e.g. CRITICAL_SECTION, or Mutex) anyway, since the access to these is not atomic. Acquiring/Releasing the lock takes care of the ugly stuff that would be caused by multiple threads.
Note that the compiler - even if it were aware of threads - couldn't help you with ANYTHING in that case.
atomic reads & writes
that's usually byte-, word- or integer- sized access, when they are well-aligned. The Sentinel example falls into that category.
In this case, the lock is not strictly necessary. BUT we now have to handle all the processor and memory issues involved with multithreading.
For this case, C++ offers volatile , which is a bit of a crutch. Additionally, the Win32 API offers Interlocked routines that allow safe access to a well aligned 32 bit integer. When linked correctly, this emits minimal assembly instructions. Using Interlocked, the sample could be fixed like this:
replace Sentinel = n with InterlockedExchange(&Sentinel, n)
replace Sentinel++ with InterlockedIncrement(&Sentinel);
replace reading Sentinel with InterlockedExchangeAdd(&Sentinel, 0)
On 64 bit systems, you also have 64 bit Interlocked functions, as well as finer-controlled routines
Phew.
|
|
|
|
|
Great post, Phew!
I agree with all of your comments. Except two more points,
1.
About Sleep(0), you mentioned there may be exceptions thrown in Sleep function. But from MSDN, I can not find, could you kindly point it out please?
http://msdn2.microsoft.com/en-us/library/ms686298(VS.85).aspx[^]
peterchen wrote: Sleep(0) might throw an exception when it is finished.
2. About the purpose of volatile
From MSDN, it is not formally mentioned that volatile will ensure the atomic operation -- only used to prevent compiler from optimization. If I missed any information, please feel free to correct me.
peterchen wrote: For this case, C++ offers volatile, which is a bit of a crutch
regards,
George
|
|
|
|
|
1.) The compiler doesn#t rea the MSDN
All it knows is that it is imported from an Kernel32.dll with a given signature. The funciton could do anything. You could have designed it to e.g. poll for a file, and when it's gone it throws. (And that's exactly what MUST happen if the DLL is replaced with one that does this.)
(Point of above post: even if the compiler had special built-in knowledge about Sleep(), it couldn't help you in all the cases where ).
2.) Maybe I wasn't clear with this.
Of course, volatile doesn't affect atomicity.
A memory read/write is atomic if it byte, word, DWORD (or QWORD on 64 bit) and is aligned on an address a multiple of its size. Then, the value is read or written in a single cycle, and no other thread or core could possibly interrupt.
However, all caching problems still remain: either caching in a register (by the optimizer), or caching in the CPU. volatile helps with these (but would be mostly useless with non-atomic accesses anyway).
|
|
|
|
|
Thanks peterchen,
I learned a lot from you. Let us come to the conclusion directly, what are your best practices when we need to use volatile and when not to use.
regards,
George
|
|
|
|
|
Heh
Different opinions exist.
While I don't 100% follow their reasoning, the brightest minds on these issues recommend to:
* never use volatile (for threading purposes)
* for atomic data, use Interlocked functions
(If I understand everything correctly, here is where you could use volatile, too)
* otherwise, lock
The idea is (I think) that the Interlocked Functions more directly express what you intent to do, at a finer granularity.
However, volatile should work fine as well.
PLEASE keep in mind that in all the discussion I am on thin ice hear already. There are a few things that I need to read up again. It is all to the best of my knowledge and from memory, but still could be full of errors.
|
|
|
|
|
|
I've got a main cpp file that is including a header file (MyAlgorithms.h) where I'm playing around with doing algorithms the STL way with templates and iterators.
If I put the actual code for my template algorithm in the header file everything compiles and works fine. If I define it in my header file, yet put the code in its own cpp file (MyAlgorithms.cpp) I get a compile error.
I can also put the code for the template function in the main cpp file and then it compiles/runs fine.
What gives?
I can even put the code for a non-template normal old function in the MyAlgorithms.cpp file and define in the header and it's fine.
undefined reference to `int ajo::roffeltemplate<int>(int)' CppPlayground.cpp
In main:
<br />
#include <iostream><br />
#include <string><br />
#include <vector><br />
#include <algorithm><br />
#include <br />
<br />
#include "Util.h"<br />
#include "PracticalSocket.h"<br />
#include "Protocol.h"<br />
#include "MyAlgorithms.h"<br />
<br />
...<br />
<br />
<br />
int blah = ajo::roffeltemplate(1);
MyAlgorithms.h
#ifndef MYALGORITHMS_<br />
#define MYALGORITHMS_<br />
<br />
#include <string><br />
#include <vector><br />
#include <algorithm><br />
<br />
namespace ajo<br />
{<br />
<br />
#define ROFFEL 1<br />
<br />
void printroffel();<br />
<br />
template <typename t=""><br />
T roffeltemplate(T val);<br />
<br />
template <typename iter,="" typename="" t=""><br />
T sumsquares(Iter beg, Iter end, T init);<br />
<br />
<br />
}<br />
<br />
<br />
<br />
#endif /*MYALGORITHMS_*/<br />
</typename></typename></algorithm></vector></string>
MyAlgorithms.cpp
<br />
#include "MyAlgorithms.h"<br />
<br />
#include <iostream><br />
#include <string><br />
<br />
using namespace ajo;<br />
<br />
template <typename t=""><br />
T roffeltemplate(T val)<br />
{<br />
return val;<br />
}<br />
<br />
template <typename iter,="" typename="" t=""><br />
T sumsquares(Iter beg, Iter end, T init) <br />
{<br />
T result = init;<br />
for(;beg != end; ++beg) <br />
{<br />
result += (*beg) * (*beg);<br />
}<br />
return result;<br />
}<br />
<br />
void ajo::printroffel() <br />
{<br />
printf("roffel");<br />
}<br />
<br />
<br />
</typename></typename></string></iostream>
I don't think it's a make issue, although I am experimenting with Eclipse's auto-make. I could try writing the make myself like I usually do. I can force a compiler error in the MyAlgorithms files so I know they're being compiled.
|
|
|
|
|
Great... the forum ate everything in the code with angle brackets.
Also, this is compiled in linux with gnu compiler.
|
|
|
|
|
Please reformat you post.
ArmchairAthlete wrote: Great... the forum ate everything in the code with angle brackets.
Use <pre> tags to surround code, i.e.:
<pre>
code here
</pre>
and, for template, escape the < operator ( with < ).
If the Lord God Almighty had consulted me before embarking upon the Creation, I would have recommended something simpler.
-- Alfonso the Wise, 13th Century King of Castile.
[my articles]
|
|
|
|
|