Fast Multi-Reader Lock

Evgeny Zavalkovsky

4.80/5 (15 votes)

20 Aug 2014CPOL6 min read

29K

406

Multi-reader lock that provides real readers' lock-free and wait-free concurency on multi core systems.

Download FastMultReadLock.zip - 798 B

Introduction

Readers-writer lock, shared-exclusive lock, multiple readers / single-writer lock or multi-reader lock are all different names for the same synchronization primitive. The idea is simple and elegant - we have shared memory, which we want to keep consistent. The straightforward way to do that is to guard access to the memory by mutex. But we may do better - if you just read the shared memory- consistency cannot be broken. So let's provide concurrent (inclusive) access for readers, and exclusive access for writers. That is exactly what all listed above locks do. For more details please see http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock.

Standard Multi-Reader Lock implementation

Standard multi-reader lock implementations use basic synchronization primitives, to protect internal shared data. You may see an example http://www.glennslayden.com/code/win32/reader-writer-lock. So, also on readers lock/unlock operations, some synchronization primitive is taken. That happens for a short period, just to update readers counter, but still happens.

Now, let's assume that we run on powerful, 32 cores system. And we have 32 reading threads, that do concurrent reading, and read operation itself is pretty short - I mean reader unlock is called a short time after reader lock. Will we really use all resources of our powerful (32 cores) system for concurrent execution? The answer is - No. Readers' threads will be actually serialized on getting reader mutex, just for one operation - to update readers counter. In the example implementation, that will happen in void EnterReader(void) here:

40	        EnterCriticalSection(&m_csReaderCount);
41	        if (++m_cReaders == 1)
42	            ResetEvent(m_hevReadersCleared);
43	        LeaveCriticalSection(&m_csReaderCount);

and in void LeaveReader(void) here:

49	        EnterCriticalSection(&m_csReaderCount);
50	        if (--m_cReaders == 0)
51	            SetEvent(m_hevReadersCleared);
52	        LeaveCriticalSection(&m_csReaderCount);

And that is exactly the problem that we will solve.

Fast Multi-Reader Lock implementation

As we saw, the problem is synchronization primitive that is taken for readers lock/unlock operations. We will implement readers' lock/unlock operations w/o such sync, if no writers are pending.

Let's move to the source code, and review all lock/unlock operations

C++

#ifndef _FAST_MULT_READ_LOCK_
#define _FAST_MULT_READ_LOCK_

#include <cassert>
#include <atomic>
#include <mutex>
#include <condition_variable>

struct FastMultReadlockConcurTest;

class FastMultReadLock
{
public:
    void ReaderLock()
    {
        do
        {
            if (pending_writer.load())
            {
                std::unique_lock<std::mutex> lk(pending_writer_update_mutex);
                pending_readers_cv.wait(lk, [this]{return pending_writer.load() == false; });
            }

            ++read_count;

            if (pending_writer.load() == false)
            {
                break;
            }

            --read_count;

        } while (true);
    }

    void ReaderUnLock()
    {
        --read_count;
    }

    void WriterLock()
    {
        writer_entry_mutex.lock();

        pending_writer.store(true);

        while (read_count.load() != 0);
    }

    void WriterUnLock()
    {
        {
            std::lock_guard<std::mutex> guard(pending_writer_update_mutex);

            pending_writer.store(false);

            pending_readers_cv.notify_all();
        }

        writer_entry_mutex.unlock();
    }

public:
    FastMultReadLock() : read_count{ 0 }, pending_writer{ false } {}
    ~FastMultReadLock() {}
    FastMultReadLock(FastMultReadLock&) = delete;
    FastMultReadLock(FastMultReadLock&&) = delete;
    FastMultReadLock& operator= (FastMultReadLock&) = delete;
    FastMultReadLock& operator= (FastMultReadLock&&) = delete;

private:
    std::atomic<uint32_t>     read_count;
    std::atomic<bool>         pending_writer;
    std::mutex                writer_entry_mutex;
    std::mutex                pending_writer_update_mutex;
    std::condition_variable   pending_readers_cv;

    static FastMultReadlockConcurTest concur_test;
};

// Lock free test for bool and uint32_t atomics
struct FastMultReadlockConcurTest
{
    FastMultReadlockConcurTest()
    {
        assert(std::atomic<uint32_t>().is_lock_free() && std::atomic<bool>().is_lock_free());
    }
};

#endif // _FAST_MULT_READ_LOCK_

We will review all 4 ( reader/writer and lock/unlock operation )

Let's start from ReaderLock(), that is most complicated part of this implementation.

ReaderLock()

C++

void ReaderLock()
{
    do
    {
        if (pending_writer.load())
        {
            std::unique_lock<std::mutex> lk(pending_writer_update_mutex);
            pending_readers_cv.wait(lk, [this]{return pending_writer.load() == false; });
        }

        ++read_count;

        if (pending_writer.load() == false)
        {
            break;
        }

        --read_count;

    } while (true);
}

First of all let's check if ReaderLock is really Lock Free if no writers are pending. The first if statement is false ( no pending writers ), we increase readers counter, check again for pending writers, and now the second if statement is true, so we break and exit the method. So, if there are no pending writers, ReaderLock() may be executed on multiple cores, w/o serialization.

Now let's review correctness of ReaderLock() when there are pending writers, that may come any time during ReaderLock() execution.

If the first if statement is true that means that there is( or was on the moment of checking ) pending writer, we will wait for condition variable signal, that writer finished. Note, if in between, writer unlocks writer' lock, , condition variable will not be blocking operation, and we will continue like no pending writers. But if we will be blocked on the condition variable, once writer will exit, we will continue, under the same "optimistic" assumption- no pending writers.

Ok, now we got to ++read_count. No matter how we got here, initially we didn't have pending writers, or we passed throw condition variable. We will increase read_count and check pending writers again. If in the meantime we got pending writer - we will just rollback (--read_count) and start over. If there are no pending writers - we are lucky, and we succesfully locked ReaderLock(). We will break and exit the method.

But why that will block writers to take the lock? Because read_counter and pending_writer are both atomic variables with memory_order_seq_cst order. WriterLock() sets pending_writer and then starts polling read_count. On other hand, ReaderLock() increases read_count and then checks pending_writer again. If in the meantime any writer came - ReaderLock() gives up ( decreases read_count) and starts over. But if not - no writer may take the lock because read_count is not zero.

ReaderUnLock()

C++

void ReaderUnLock()
{
    --read_count;
}

There is nothing to explain here, we just decrease read_count. If the last reader exits - read_count will be set to zero and if there is any polling writer - he will take the writer lock.

WriterLock()

C++

void WriterLock()
{
    writer_entry_mutex.lock();

    pending_writer.store(true);

    while (read_count.load() != 0);
}

First of all we want to serialize all writers on the entry to the method itslef : writer_entry_mutex.lock(); Now we may be sure that only single writer will access multi-reader lock internals. We just need to set pending_writer flag, and to poll for the last reader to get out. I don't like polling, but this is the best solution here. We built a lock, optimized for concurrent readers' access and we don't guard access to the read_count. Once we start to poll the read_count for zero, read_count will only decrease. The only place where we increase the read_count is in ReaderLock(), but all new coming readers will wait on condition variable, because we set pending_writer flag. Note, this polling operation will use only one core of our multicore system, because all other potential writers will be blocked on writer_entry_mutex. Once the last reader gets out - pending writer comes in.

As an alternative, instead of polling of read_count, we could use another condiition variable, that checked for read_count zero condition, and had to be signaled by last reader, that is getting out. But that solution has two drawbacks: The first - we had to check for zero condition in ReaderUnLock() and signal condition variable, for the last reader - that was causing readers performance degradation. The second - we had to wait for read_count with timeout - wait_for API. And that just because we cannot guard access read_count, that is the whole point of this solution - lock free readers lock. So, we had to wait with timeout, and in case of timeout expiration - wait_for again.

Our solution is optimized for readers' performance, so read_count polling is better.

WriterUnLock()

C++

void WriterUnLock()
{
    {
        std::lock_guard<std::mutex> guard(pending_writer_update_mutex);

        pending_writer.store(false);

        pending_readers_cv.notify_all();
    }

    writer_entry_mutex.unlock();
}

We need to reset pending_writer flag, and to notify all pending readers. Here, we will take pending_writer_update_mutex to synchronize access to pending_writer flag between WriterUnlock() and condition variable that is taken in ReaderLock(). By that, we may be sure that will not be false wait on condition variable. Once we reset the flag, we will notify all pending readers' threads. And finally, unlock writer_entry_mutex and provide access for other writers.

Note, extra scope {} and std::lock_guard are here for better granularity of pending_writer_update_mutex.

FastMultReadlockConcurTest

We use C++11 atomic variables for pending_writer flag and for read_count counter. Particularly, we use std::atomic<bool> and std::atomic<uint32_t> types. Are read and modify operations for these types Lock Free? If not - our solution will not work. Any access to an atomic variable will use internal implementation lock, and we will have the same readers' serialization that we tried to avoid.

This is why we need FastMultReadlockConcurTest.

C++

struct FastMultReadlockConcurTest
{
    FastMultReadlockConcurTest()
    {
        assert(std::atomic<uint32_t>().is_lock_free() && std::atomic<bool>().is_lock_free());
    }
};

If any of the types (std::atomic<bool> or std::atomic<uint32_t>) is not Lock Free we will fail on assert.

Concurrency and performance analysis

Let;s analyze concurrency and performance of all API methods.

ReaderLock()- if no pending writers, ReaderLock() is Lock Free, Wait Free and may be parallelized on multiple cores. If there is pending writer - readers will be blocked, but that exactly what's expected from multi-reader lock. But what will happen on writer exit? All pending readers will be Lock Free, Wait Free and will be parallelized on multiple cores. Moreover, that will happened immediately and concurrently (condition variable notify_all API).

ReaderUnLock() - absolutely Lock Free, Wait Free and may be paralellized on multiple cores.

WriterLock() and WriterUnLock() - definitely non Lock Free and non Wait Free operations. Non Lock Free by the nature (exclusive access), and non Wait Free by the implementation - that's the penalty for ReaderLock() Lock Free. But in our case( optimized for concurrent and frequent reading and rear writing ) that is acceptable. Just to remind, waiting (polling) will be executed always for one writer only - so in multi core system not much resources will be wasted.

Conclusion

Fast Multi-Reader Lock performs well on multi core systems and provides real, absolutely Lock Free and Wait Free readers' concurrency for multiple readers.

However, if typical use case is not heavy concurrent readers' execution, if write operations are frequent, or system doesn't have multiple cores - the standard multi reader implementation may give better performance.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)