Introduction
Multi threading can improve application performance when IO read/write is involved. Unfortunately, shared resources (shared variables) can have different versions at each CPU cache. The consequence is that the application's behavior cannot be predictable. Java provides synchronized
keyword to keep shared resources consistent across CPU's caches. Unfortunately again, synchronized
keyword slows down the application.
I use JMH for micro benchmark with AverageTime
mode, it mean the result of benchmark is average run time of each testcase, lower output is better. You can find more information about micro benchmark at this link.
Why Synchronized Slowdown Application?
When a thread gets locked and starts to execute instructions in a synchronized block, all other threads will be blocked and become idle. Execution context (CPU cache, instruction set, stack pointer ...) of those threads will be stored and execution context of other active threads will be restored to resume computing. It's called context switch and requires significant effort of the system. Task scheduler also has to run to pick which thread will be loaded.
volatile keyword
volatile
keyword just does a few things: tells CPU read value of resources from main memory, not from CPU's cache; A write to a volatile field happens-before every subsequent read of that field. Volatile can never have a higher overhead than synchronized, volatile
will have the same overhead with synchronized
if synchronized
block has only one operation.
volatile
keyword works well if there is only one writing thread. If there are 2 or more writing threads, race condition will happen: all writing threads get the latest version of variable, modify value at its own CPU, then write to main memory. The consequence is that data in memory is just the output of one thread, other threads' modification were overridden.
Package java.util.concurrent
Doug Lea did great work when he created and improved this package. This package has a lot of tools for managing threads, and also contains some thread-safe data structures. Those data structures also use synchronized
and volatile
under the hood but in a sophisticated way, you can benefit from much better performance than writing your own code.
ConcurrentHashMap
"obeys the same functional specification as Hashtable
" and gives you the advantage of thread-safe.
public class TestHashMap {
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Threads(10)
public Integer concurrentHashMap(BenchMarkState state){
Integer temp = null;
for(int i = 0; i < 100000; i++){
temp = Integer.valueOf(i);
state.chm.put(temp,temp);
}
return temp;
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Threads(10)
public Integer hashMap(BenchMarkState state){
Integer temp = null;
for(int i = 0; i < 100000; i++){
temp = Integer.valueOf(i);
synchronized (state.hm) {
state.hm.put(temp,temp);
}
}
return temp;
}
@State(Scope.Benchmark)
public static class BenchMarkState {
@TearDown(Level.Invocation)
public void doTearDown() {
hm.clear();
chm.clear();
}
public HashMap<Integer, Integer> hm = new HashMap<>(100000);
public ConcurrentHashMap<Integer, Integer> chm = new ConcurrentHashMap<>(100000);
}
Benchmark Mode Cnt Score Error Units
TestHashMap.concurrentHashMap avgt 200 6.424 ± 0.052 ms/op
TestHashMap.hashMap avgt 200 41.387 ± 0.880 ms/op
AtomicInteger
and other similar classes use volatile
and Unsafe.compareAndSwapInt
. AtomicInteger
can call as busy-wait, it mean a thread always checks condition to execution. This thread does nothing but task scheduler cannot detect this check and considers this thread is busy, so that task scheduler cannot take CPU to another thread that is ready for execution. Busy-wait works well if the condition can archive after a few clocks of CPU. We can see in benchmark result that synchronized is unstable.
public class TestCAS {
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Threads(10)
public int atomic(BenchMarkState state){
while(state.atomic.get() < 100000)
while(true){
int temp = state.atomic.get();
if(temp >= 100000 || state.atomic.compareAndSet(temp, temp + 1))
break;
}
return state.atomic.get();
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Threads(10)
public int integer(BenchMarkState state){
while(state.integer < 100000){
synchronized (state.integer) {
if(state.integer < 100000)
state.integer += 1;
}
}
return state.integer;
}
@State(Scope.Benchmark)
public static class BenchMarkState {
@TearDown(Level.Invocation)
public void doTearDown() {
atomic.set(0);
integer = 0;
}
public AtomicInteger atomic = new AtomicInteger(0);
public Integer integer = new Integer(0);
}
Benchmark Mode Cnt Score Error Units
TestCAS.atomic avgt 200 12.484 ± 0.130 ms/op
TestCAS.integer avgt 200 15.896 ± 2.012 ms/op
lock
Lock
has more flexible features than synchronized
, you can use tryLock()
for a specific time to wait or can make sure the longest waiting thread gets the lock with fairness option. But synchronized
keyword can guarantee both execution sequence and data freshness, the source code with synchronized
is also simple. Lock
will be a nightmare if a junior developer forgets to call unlock()
or doesn't put unlock()
at finally
block.
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Threads(2)
public Integer lock_2(BenchMarkState state){
while(true){
state.lock.lock();
if(state.intLock >= 100000) {
state.lock.unlock();
break;
}
state.intLock += 1;
state.lock.unlock();
}
return state.intLock;
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Threads(2)
public Integer synchonized_2(BenchMarkState state){
while(true){
synchronized (state.LOCK) {
if(state.intSync < 100000)
state.intSync += 1;
else
break;
}
}
return state.intSync;
}
@State(Scope.Benchmark)
public static class BenchMarkState {
@TearDown(Level.Invocation)
public void doTearDown() {
intSync = new Integer(0);
intLock = new Integer(0);
}
public ReentrantLock lock = new ReentrantLock();
public Object LOCK =new Object();
public Integer intSync = new Integer(0);
public Integer intLock = new Integer(0);
}
Benchmark Mode Cnt Score Error Units
TestLock.lock_2 avgt 200 11.069 ± 0.300 ms/op
TestLock.lock_32 avgt 200 65.025 ± 1.114 ms/op
TestLock.lock_64 avgt 200 136.796 ± 3.589 ms/op
TestLock.lock_8 avgt 200 15.996 ± 0.317 ms/op
TestLock.synchonized_2 avgt 200 3.986 ± 0.059 ms/op
TestLock.synchonized_32 avgt 200 69.922 ± 1.645 ms/op
TestLock.synchonized_64 avgt 200 156.641 ± 3.948 ms/op
TestLock.synchonized_8 avgt 200 14.754 ± 0.248 ms/op
I do benchmark with different number of thread: from 2 threads to 64 threads. We can see that synchronized work much better than ReentrantLock with 2 threads, but in scenario with 64 threads, ReentrantLock has better performance.
Immutable Object
The idea is simple, if one object never changes values, it's thread-safe. But there is a problem, you have to create a new object each time you want to change some values, consequently there is overheat of GC. Some libraries can make immutable objects more easy to deal with, like https://immutables.github.io.
Conclusion
Sharing resources between threads is easy with synchronized
keyword, but it can cause world wide waiting and slowdown your applications. Other simple techniques also can archive thread-safe, but are faster than synchronized
.
You can check out the full source code at https://github.com/vudangngoc/java-benchmark.