Introduction
Using critical section and mutex related articles have been done to death, here we try a unique approach to understand the test and set x86-32 bit instruction for implementing critical sections (Spin lock part only). Critical sections (CS) are used in all modern programming paradigms where synchronization is required, though most modern day OS provide CS functionality, its internal implementation is abstracted.
Background
You will have to know about x86 assembly and a few concepts on critical sections (http://en.wikipedia.org/wiki/Critical_section), please keep in mind that this code is tested for user mode only but can be adapted to work in kernel mode with little or no modification.
Before the test and set instruction (atomic op-code BTS), we had an algorithmic approach, the more famous (and easier to understand) was Peterson's algorithm (http://en.wikipedia.org/wiki/Peterson's_algorithm), but as with all modern based CPUs, pre-empting based on interrupt handling and instruction reordering (for weak memory models) can cause the set and test functionality to run in non-atomic and incorrect order (http://en.wikipedia.org/wiki/Memory_ordering).
Basics
Critical sections provide mutual exclusion in a multi threaded environment. Correctly implemented CS must support three criteria: mutual exclusion, progress, and bounded waiting.
- Mutual exclusion: 2 or more threads using the same CS object can never execute the same CS at the same time, this must be enforced.
- Progress: If the CS object is not owned by any thread, then any one thread can take ownership, (not as easy as it sounds)
- Bounded waiting: Must enter only when CS object is up for grabs
For the above 3 points to be met, we must have an atomic Test and Set (TS) instruction.
Using the Code
The code is kept as small and reader friendly as possible, no error checks have been put in place, the reader is advised to use this code as a stepping stone to building more elaborate mutex/critical section APIs.
At the core of the critical section is an atomic operation set and test using x86 "bts" instruction.
void testAndSet(volatile UINT *m) {
_asm {
mov eax,[m]
spin:
lock bts [eax],0; JC Notgotmutex
jmp ret1;
Notgotmutex:
jmp spin
ret1:
}
}
The "bts
" instruction is used with a "lock
" prefix to ensure correct operation in a multi core environment.
This instruction is responsible for copying a specified bit value to the carry flag and then set that bit value (atomic operation), we check the value of the operation via the carry flag, if the mutex was previously acquired, we spin and retest. Details of this instruction are available @ http://www.fermimn.gov.it/linux/quarta/x86/bts.htm.
Memory fence is not required in this case since set and test takes care of everything.
The provided code attempts to synchronize 100 threads, you can use this code snippet to build more complex critical sections with reference counting (to ensure locks and unlocks are in pairs), also you can ensure that the thread does not get locked on a mutex object it currently owns.
To release the CS object (an unsigned int), we must reset the value to zero.
mutex_recursion.zip supports recursion by using the object to keep track of the thread ID (but not count).
I have also attached code to create a complete functioning Mutex using Futex and spin locks
Points of Interest
I hope this clears out everything related to set and test operation.
History
- 14th January, 2014: Initial post