|
Thanx alot !
It's almost what I need for mutexes !
I've spent 3 weeks searching the web to find a suitable threading library, but in vain
not in vain, but none of them fulfilled my requirements.
I tryied :
BOOST --------> So complex and dependent on boost base api
pThread Win32--> not so OO and poor capabilities
ZThread -------> meets 95 % of requirements but not stable many many memory leaks
etc ....
I needed advanced features like :
-- Timed, scoped, interruptible, exception-safe mutexes
-- Timed, interruptible, templated Queue
-- controllable, interruptible, cancellable, advanced threadpool
-- etc
So I decided to do it myself, and if you insist, I'll post an article with complete Lib
will give credits to all those who participated in it..
Got 5 from me!
|
|
|
|
|
One of the approaches I like to take is that the mutex guards some data object, but the thread itself does not live by the mutex. The thread itself is notified of "work" using an event. For example.
DWORD WINAPI SomeApplication_WorkerThread(PSTRUCT pSomeDataStructure)
{
BOOL bThreadIsAlive = TRUE;
while(bThreadIsAlive)
{
WaitForSingleObject(pSomeDataStructure->bThreadEvent, INFINTE);
switch(pSomeDataStructure->EventMsg)
{
case THREAD_EVENT_DEAD:
bThreadIsAlive = FALSE;
break;
case THREAD_OTHER_EVENT:
SomeApplication_OtherEventHandler(pSomeDataStructure);
break;
.... (etc).
}
}
return 0;
}
The thread itself is governed by an event and the data structure may be governed by a mutex. You may even govern the mutex outside of the event, such as queueing events, there could get a SomeApplication_GetNextEvent(); and the mutexes guard the event list. You could also guard the event structure by only allowing 1 event posted, the others have to wait until the event is done. In those cases, there could be a cancel event if the thread is signaled to die. You also may not always want to use a Mutex, for example. A Mutex is a kernel object, it's only real advantage is that you can give it a name and share it between multiple processes. For user-mode applications that don't need to share between multiple processes, a CRITICAL_SECTION variable does quite fine and it's faster. As long as you really design the application so you don't have to hold a critical section longer than need be you really wouldn't need a cancel event as well.
Of course, everyone has their own method or view of things, just my 2 cents.
|
|
|
|
|
Toby Opferman wrote:
You also may not always want to use a Mutex, for example. A Mutex is a kernel object, it's only real advantage is that you can give it a name and share it between multiple processes.
The very article you just read does something with a mutex that can't be done with a critical section. Namely, you can't use WaitForMultipleObjects with criticle sections. Also, you can't specify a time out value for a critical section. You are stuck with EnterCriticalSection, which waits forever, and TryEnterCriticalSection, which never waits. My own feelings are that waiting forever is dangerous, and should only be used when the sync object only syncronizes trivial operations.
Nathan Holt
|
|
|
|
|
Yes, you can wait on multiple objects, etc. and even have a timeout. This isn't usually nessecary in most cases though and with a good design can be avoided. Usually, you can re-design the project better so that you do not need timeouts as you shouldn't be holding a critical section for long periods of time and doing a lot of work.
The problem is that people use this timeout feature of a mutex to get around bad designs. You should get the critical section, do quick non blocking work, and leave. Only special circumstances would require that feature and even then I would be skeptical.
|
|
|
|
|
Toby Opferman wrote:
Yes, you can wait on multiple objects, etc. and even have a timeout. This isn't usually nessecary in most cases though and with a good design can be avoided. Usually, you can re-design the project better so that you do not need timeouts as you shouldn't be holding a critical section for long periods of time and doing a lot of work.
The problem is that people use this timeout feature of a mutex to get around bad designs. You should get the critical section, do quick non blocking work, and leave. Only special circumstances would require that feature and even then I would be skeptical.
May you spend a year looking at a "NFS server Titan not responding, still trying" message. I assure you that even an hour of that will leave you nervous about code that waits forever for ANYTHING. It is true that most of the deadlocks I have encountered have been with communicating with other computers, but I suspect that network disconnections and computer crashes are just a couple of many things that can go wrong.
I've had issues with a GDI driver that waited for a bitblt to complete that never completed. It is true that a better design would have prevented this behavior, but a timeout would have at least made it possible to report an error instead of freezing the OS.
I am curious about the bad designs that you've seen time outs used to get around. What sort of things do people do?
Nathan Holt
|
|
|
|
|
How about this, I've never seen code that waited on a mutex that couldn't be avoided with a better design as you even admit yourself. In that instance, because of a bad design, a timeout would be a bandaid.
If a process crashes, critical sections would still be fine. The process is no longer around, it's gone. So, there is no problem. Doesn't matter if there's a timeout or not, the process is gone!
If two processes are sharing mutexes and one of them traps, as long as that process is not in a debugger or after you close the debugger, all objects being held by usermode will be canceled and closed. So, at that time, the mutex "should" be released if that process was holding it, as long as there is logic in the kernel that operates in that way. It could abandon the mutex, but even then, this is still a special case then and if your application traps are you really in a good state to begin with? And I was still only referring to within one process using critical sections. Like I said, there could be special not-every-day circumstances that may require it, but I'm not buying it all the time or even most of the time.
Like I said though, there are some special circumstances where it may not be avoidable. Then again, what do you want to do if the timeout occurs? Exit the process? Error Message? Is this work that needs to be done so critical that it needs to be done no matter what? What are you going to do then? "OK, I decided I don't need to do that then".
With usermode processes, I usually design them in a way that critical sections work perfectly. I've never seen a situation that couldn't be solved with a better solution.
If you read my first post, this is what I was talking about. Redesign your mutex so you don't wait on the mutex or need to time out the mutex, but rather an event. That's what an event is for. You can signal another thread to wake up to cancel or exit. You don't need to wake up a mutex. You grab the mutex, you do some small work and you leave. Within your own process there shouldn't be a siutation that you could get into this mess. I can see with cross-processes, but not with in a single application.
I'm not talking about code that waits forver on "ANYTHING", I'm talking about grabbing locks. They should be used sparingly and designed correctly. I'm sure "NFS server Titan not responding, still trying" isn't waiting on a critical section from another thread in it's own process forever or that is an extremely bad design.
|
|
|
|
|
Here's an example of where timeouts just weren't enough or even the right choice.
Someone performed some operation that sent data to a client. This situation you obviously will need timeouts if you want to wait, so obviously an event was used. So, the thread held onto the IRP and did a wait loop. This wait loop waited on an event with a timeout. Another thread would recieve data and then set the event to signal to the thread it was ok to exit.
Seems fine, right? Well, what if the time out had expired and the IRP became completed? The network roundtrip could have taken a long time and when it came in, perhaps it tried to use the IRP. This did happen eventually.
I solved it with a 3 way handshake using a mutex that didn't have a timeout, since it's rediculous to timeout on the mutex, the event yes, mutex no.
The 3 state was NOTHING, WAITING, PROCESSING.
So, once the data had been sent to the client machine, the state when to "WAITING". Now, only the mutex can change the state. That's all the mutex did, change the state or check the state, nothing more. Now, if the timeout occurs, if the state is "WAITING" we can clear the IRP and set it to "NOTHING". If the timeout occured and the state is "PROCESSING", we continue to wait.
In the other thread, when we recieved data, if the state was "NOTHING", we exit. If the state is "WAITING" we switch to "PROCESSING" and leave the mutex. We then do work, and finally signal the event and possibly switch the state to "NOTHING".
Now, what happens if something goes wrong during the processing of data or traps? Nothing should go wrong and all exit points should signal the event. Also, since this was in the kernel, a trap would mean a bluescreen so, a timeout is worthless anyway.
The main point is, use the type of object that is right for the job. A mutex is not always the right type of object. If you want to perform waits and timeouts, events are much better suited.
|
|
|
|
|
Toby Opferman wrote:
How about this, I've never seen code that waited on a mutex that couldn't be avoided with a better design as you even admit yourself. In that instance, because of a bad design, a timeout would be a bandaid.
A bandaid can still be better than a permanently hanging program. Are you sure your program is deadlock free? If you are, you are probably guarding really trivial things. It's nice if you can implement your program so that you can do that, and maybe you usually can. A timeout for a mutex isn't a substitute for designing code to not deadlock, but I think it is still an important error checking feature. Usually a time out will indicate a serious error that needs to be dealt with by extreme methods. Sometimes (with certain GUI operations), it's more important to complete the attempt fast than it is to actually succeed. The OnUpdate handler for a control in a dialog bar, for instance, can just wait for the next time it's called.
Nathan Holt
|
|
|
|
|
Everything that you guard should be kept to a minimal. If you are guarding something extremely complex and time consuming you probably should re-think your design. The best designers can take a complex problem and implement an elegant yet simple solution as opposed to the other way around.
|
|
|
|
|
Toby Opferman wrote:
Everything that you guard should be kept to a minimal. If you are guarding something extremely complex and time consuming you probably should re-think your design. The best designers can take a complex problem and implement an elegant yet simple solution as opposed to the other way around.
Please! I heard you the first time. Did you hear me disagree?
If you just need to control access to a few numbers, you can get away with not having a timeout. Consider how it changes if you are controling access to an image. Consider even further if you need controlled access to so many images that they have to be stored on the disk in some sort of temporary file while you operate on them. At this point, just accessing your controled resource involves calls to the operating system, loops to copy data, and possibly calls to a third party image processing library. Can you really just assume that all the code is bug free even after weeks of testing? Can you really assume that access can be done reasonably fast, even when some background program suddenly needs to perform big disk intensive operation? It seems to me that even the background threads need to be waiting on both the sync object for the images and something that can tell the thread to terminate now and not later. A critical section won't work. It's even worse in the GUI thread when a user wants to see one of the images. The user really needs an oportunity to decide he doesn't want to see the image so much that he's willing to wait for it. What's your aproach to a simple and elegent solution to this?
Nathan Holt
|
|
|
|
|
Well, it's hard to tell with the rambling on what the exact problem you want to solve is.
A "sync" object could be an event. Events should be used for the purpose of informing that an operation has completed, not a critical section or a mutex. So right there the design is flawed, you're using the wrong objects for the job. Secondly, what exactly are you protecting? Access to an image? Are you writing to this image? One thing, If everyone wants to "read" memory, it doesn't need to be protected. If someone wants to "write" memory, then it does. So, the solution would actually be a combination of objects.
The first is that you do NOT ever want to lock on a read unless someone is writing. You want to then have a counter that then only locks a read operation if someone is writing. That way, everyone can read, but only one person can write. And readers are only blocked if someone is writing. Now, all that should be guarded is the writing to the object and reading, which means this is a simple operation. You could then use events to wait on access. If you want to timeout, the event times out, not the critical sections.
For a simple example, you could acquire a lock and increment counters for the number of readers and writers. Many readers can get in to read obviously, but only one writer at a time. If a writer comes it, it needs to wait for the current readers to finish, then some event could be signaled to tell it it's time. Now, there's perhpas another event or the counter is used and it then tells new readers that come in that there's a writer. Now, when the writer is done, it then could signal for all the readers to be allowed to read again.
You could even do things like return "BUSY" automatically. It knows there's someone writing, instead of a timeout, just return "BUSY". That's another solution. You could then add priorities to readers and writers to then determine if multiple writers and readers are attmpeting to get in, when some writer finishes does he wake up the next writer or the next reader.
You could also do a queue. You enter and supply a callback. So, when it's your turn and the data is ready, it gives you a callback so you don't have to block, you actually get a callback. You can even append a time to the callback, an expired timer could also be fired. For example, you attempt to acquire a resource and it's busy, so you're added to a queue.
Then you supply a timeout or infinite. Now, perhaps a worker thread or a scheduler comes by and says hey your timeout has expired so it calls your callback and says "expired". Or, it calls you back and says we've locked the resource for you, here it is do your stuff.
All these, in a single process, would only require a critical section since you're not doing anything in side the critical section aside from copy some memory or set some number. You can even put things on queues to where only the queue is guarded and thus you pop off a work item and perform the task outside locks. There's many ways to elegantly design a system in which you would not need to wait on a mutex with a timeout. The only time a mutex would be really needed is for cross processes and then it may need a timeout since the mutex could be abandoned if that process terminates while holding it. But even then, that mutex is now lost, it's gone. All copies must close and reopen it, so it's a bug that now you can never re-acquire it again. So, it's probably best to fix the bug.
Secondly, if there are "bugs" and this is within one process space then fix the bugs. A bug report should be filed. A timeout is not an acceptable solution to a bug. Do you put your entire program in a try catch block?
|
|
|
|
|
Also, why would you need to protect a 3rd party library? can't multiple contexts be used and handles to where the 3rd party library doesn't need to be entirely protected? There are always ways to design things better.
For example, if the image is on disk and you want multiple people to copy it into memory, only one copy needs to be in memory at a time, for example, and then just a reference count to it. If multiple copies need to be in memory why do you even need to protect reading it from disk? Also, could things be done to optimize this, for example, caching. If part of the image is read, part could be cached and while another thread is reading the cached portion, another thread reader is reading more from disk. There are a lot of tricks that can be done to provide much better performance because using a ton of critical sections to read an image off disk continsouly and perform a ton of processing with a 3rd party library is extremely inefficent to begin with and if you're doing that your application deserves to trap.
Also, along with my idea of a "scheduler" and callbacks, there could be more complex operations performed such as poritions of operations that are encapsulated in your library, then when they are called, you can actually fire an event to another reader to perform an action, whereby, pausing another writer/reader at an "appropriately" determined time. This way, essetically, you have implemented "resource sharing" and scheduling to where one thread does not hog something that is a large operation.
Kinda like how a disk driver would when multiple threads want to access the disk. depending on where the read/write head is, it could satisfy another reader at the same time or inbetween satisfying a different reader or writer.
|
|
|
|
|
I am replying to both of your posts. Obviously, I can't tell the CP editor to quote from the first one. Here are my reactions to the first post.
1: In order to read an image from a temp file, one first has to seek to the position of the image, and then read it. One can't allow another thread to either seek or read in between these operations. One can't allow multiple threads to read from a temp file.
2: The alternatives you described in previous post all sound much more complicated than a "one thread at a time" rule, especially since "one thread at a time" needs to be enforced.
3: I do put my background threads in try/catch blocks. They greatly simplify cases like when the thread suddenly needs to exit. Also I use MFC, which automatically puts most or my code in try/catch blocks.
4: In the time between a bug being reported and fixed, do you really prefere to see a program fail by suddenly not responding over seeing a program fail by reporting it had an error and then exiting? I certainly don't.
Toby Opferman wrote:
Also, why would you need to protect a 3rd party library? can't multiple contexts be used and handles to where the 3rd party library doesn't need to be entirely protected? There are always ways to design things better.
I suspect that there are third party librarys that would have trouble with that. However, I was only refering to the issue that a third party library is more code in which something could go wrong.
Toby Opferman wrote:
For example, if the image is on disk and you want multiple people to copy it into memory, only one copy needs to be in memory at a time, for example, and then just a reference count to it.
Reference counting is reasonable, but may not be worth the effort, especially if keeping images on the disk was a retrofit needed because customers did bigger things than expected.
Toby Opferman wrote:
There are a lot of tricks that can be done to provide much better performance
There certainly are, but before using these tricks, one must consider how likely they are to introduce errors into your code, and how one will be forced to obfuscate ones code for the performance. For instance, a for loop is probably less error prone than writing state change funceions to do the same thing, but your idea of a schedular with callbacks would push a programmer toward writing things as state change functions. Joseph Newcomer wrote an article on these issues with optomization: http://www.flounder.com/optimization.htm[^].
Nathan Holt
|
|
|
|
|
1. I don't know if I should thank you for not reading what I wrote or wonder why you lectured me on a CSC 101 factoid. I never said files, I said memory. However, on the issue of files, I did mention that caching could be used so while one thread is still reading from a farther part in the file, other threads can read what was just read from the cache.
2. I don't see how they are complicated. In fact, an asycnchronous scheduler is a good idea and would even be extendable and reusable for just about anything. But, like I said, what you want to do is complicated when you want to implement something like this. If you aren't doing something complicated to begin with, then you would just critical section the memory and move on. No need for timeout then. You mentioned that I must have been doing things really simple in order to not require a timeout with a mutex. Now that we are doing more complex designs, requires more complex yet elegant solutions. I just turned your hard problem into something really easy. Also, this "scheduler" I talked about is something I could write in 5 minutes bug free, so I don't even see the problem with it. It can be as complex or as simple as you like.
3. I don't even know if I should respond to this one. Not only have you just admitted to using MFC, but to putting your entire code in try/catch blocks.
4. I certainly would perfer since a program failing and me not knowing about it and continuing to use it could be worse. The program could now be in an unknown state and perhaps I continue using the program for an hour and when I go to save it doesn't work. Or perhaps now that crashes too but I don't know because the exception is caught and I'm stuck in the dark.
"Joseph Newcomer"? Is he some authority on something? I've seen his stuff before and he writes a lot of invalid information. I don't think I would be taking advice from him, especially about optimization.
|
|
|
|
|
Toby Opferman wrote:
I never said files, I said memory.
I'm sorry I missed that. I was thrown because you were replying to a message that specified files.
Toby Opferman wrote:
2. I don't see how they are complicated. In fact, an asycnchronous scheduler is a good idea and would even be extendable and reusable for just about anything. But, like I said, what you want to do is complicated when you want to implement something like this. If you aren't doing something complicated to begin with, then you would just critical section the memory and move on. No need for timeout then. You mentioned that I must have been doing things really simple in order to not require a timeout with a mutex. Now that we are doing more complex designs, requires more complex yet elegant solutions. I just turned your hard problem into something really easy. Also, this "scheduler" I talked about is something I could write in 5 minutes bug free, so I don't even see the problem with it. It can be as complex or as simple as you like.
I don't see much point in arguing further about this. I doubt that we are ever likely to agree on it.
Toby Opferman wrote:
3. I don't even know if I should respond to this one. Not only have you just admitted to using MFC, but to putting your entire code in try/catch blocks.
You are clearly shocked by both of these, but have given no explanation. In particular, I still don't know what you have against putting an entire program in a try/catch block.
Toby Opferman wrote:
4. I certainly would perfer since a program failing and me not knowing about it and continuing to use it could be worse.
Certainly, a program attempting to continue running after an error that should be fatal is the worst thing a program can do. I don't think that's an excuse to not try to report errors. Programs can get into inconsistant states without deadlocks.
Toby Opferman wrote:
"Joseph Newcomer"? Is he some authority on something? I've seen his stuff before and he writes a lot of invalid information. I don't think I would be taking advice from him, especially about optimization.
MVP stands for "most valued profesional," and generally indicates an experienced programmer. Joseph Newcomer has written a number of articles on programming issues which mostly parallel my own experience. I consider his opinions worth listening to. What sort of misinformation have you seen from him?
Nathan Holt
|
|
|
|
|
Ok... and COM stands for "Component Object Model".
I agree to disagree with you.
I don't know the guy personally but one time someone came to me with something stupid and obviously false and sited him as someone who is never wrong and he read it in an article of his. I was like, you don't just go around thinking if it's on the internet it's fact, look at a real source, like MSDN. If it states it in MSDN and it's not true then it's obviously a bug or you misread it.
So, the guy actually ended up of all things emailing the guy, this Joe guy. The Joe guy replied, ya, I'm probably wrong.
The issue was the stupidest issue in the world, the guy claimed that "FindWindow" would hang your application if you used it and another application on the system was hung since it wouldn't be able to get the window text. I guess he didn't know that things like that possibly COULD be stored in the kernel and retrieved from there rather than messaging everyone on the system. I saw other articles where he listed APIs and said they were "Severe" or whatever when clearly if you don't understand something you can screw yourself but it was greatly exagerated.
That being said, don't even believe me. I'm not MSDN, nothing I write has been reviewed or authorized by Microsoft. I know Microsoft used to have some bugs in thier Windows 9x DDK, I used to hit them all the time, but they've gotten a lot better documentation wise.
BTW, Did you know that loading a DLL into a process, exiting a thread, exiting a process, unloading a dll, etc. are all guarded by a single critical section in the user mode portion of the code? While this brings in limitations known to application programmers, it just goes to show that even something thats "not so trivial" can be written using a critical section and apparently work. I mean, applications are running under windows after all
|
|
|
|
|
Toby Opferman wrote:
Ok... and COM stands for "Component Object Model".
Toby Opferman wrote:
I agree to disagree with you.
Thanks.
Toby Opferman wrote:
So, the guy actually ended up of all things emailing the guy, this Joe guy. The Joe guy replied, ya, I'm probably wrong.
I read the article that made the claim about FindWindow. I'm glad it's probably wrong. I quite agree with you about not blindly trusting anyone.
Toby Opferman wrote:
BTW, Did you know that loading a DLL into a process, exiting a thread, exiting a process, unloading a dll, etc. are all guarded by a single critical section in the user mode portion of the code? While this brings in limitations known to application programmers, it just goes to show that even something thats "not so trivial" can be written using a critical section and apparently work. I mean, applications are running under windows after all
A good reason to keep Dllmain simple . My feelings about critical sections may just be nervousness.
Nathan Holt
|
|
|
|
|
Erm... This approach is hardly called "interruptible". I would rather use either WaitForMultipleObjectsEx with fAlertable flag set to TRUE or MsgWaitFor... if I use thread message queue to deliver notifications (including WM_QUIT).
|
|
|
|
|
I think the title is exactly right. You're waiting on a mutex and the wait can be interrupted by signalling the stop handle.
Why not write an article explaining the finer points of WaitForMultipleObjectsEx() or the MsgWaitFor family?
Rob Manderson
I'm working on a version for Visual Lisp++
|
|
|
|
|
Rob, I did not mean to belittle your contribution - the article is great and the idea is sound. What I would like to point out with my comment was that locking support in Windows was quite good with plenty of alternatives to choose from - I just commented on the fact that in your article you covered one partucular approach which might be not optimal in all situations and I wanted to clarify the issue a little.
|
|
|
|
|
Sorry - you're right - my response was a tad snarky.
But seriously Why not write an article about other approaches? I took a look at the API's you mentioned and concluded that, for my application, they didn't do what I needed (no message loop in the one case and no queued APC in the other case). Nonetheless, I needed to be able to arbitrarily interrupt a mutex wait, hence my approach.
Rob Manderson
I'm working on a version for Visual Lisp++
|
|
|
|
|
Aye, I agree - any approach is good as long as it delivers the result you use it for. I was thinking about writing an article on locking but was too lazy to start it Should do it now, though ))
Anyway, thanks for the great article!
|
|
|
|
|