Exclusive lock in distributed solutions with MongoDB

urigg123

4.67/5 (4 votes)

8 Dec 2016CPOL8 min read

24.4K

276

Describes how to design a lock like mechanism in a distributed solution to allow exclusive access or leader election to one of the server components.

Download source - 513.5 KB

Introduction

Over the course of development of various large scale distributed and cloud based systems my team bumped into a number of situations where there was a need to provide some flag for an EXCLUSIVE ACCESS to a shared resource or alternatively ensure that ONLY ONE also called a LEADER process (can be also be worker, server, service) running as part of the distributed solution can execute certian workflow or act in a unique role.

Such flag would require kind of LOCK functionality that would ensure UNIQUENESS of such flag holder across the system but would also be ensure RELEASE and REALLOCATION in case of a crash of the current holder of the locked resource.

The solution presented below involves usage of MongoDB NoSql database as a provider of atomic lock functionality used in this code sample.

Background

Let's consider the following scenario: You host your business logic processing services as a Worker Role in Azure Cloud. The worker role can scale up to any number of instances of this role. All the instances of the Worker role will be getting and processing their jobs from some messaging queue. But, here comes the reason for the article, one of the instances needs to assume a unique role, say being responsible for the activation scheduled tasks that need to run in your system. There should be only one such service in the system at any point of time as same scheduled task should not be executed more than once at the same time.
One option would be to have a separate Scheduler component in the system that would be responsible for only that - runnig scheduled tasks, but going that path would naturally require additional IT resource to deploy, monitor, provide redundancy and scale for additional component.

The conclusion that our team came to was to take the approach of creating a distributed mechanism that would provide unique right to one of the earlier mentioned instances of the service and give it exclusive right to server as a Scheduler in the system. Such a mechanism would need to provide kind of LOCK like behaviour that would ensure that no second instance would concurrently server as a scheduler in the system. One of the obvious requirements for such solution would be to provide redundancy for the Scheduler compoenent so in case something happens (such as crash or severe resource congestion) to the instance that is currently wearing the "hat" of the Scheduler - another of his healthy peers will take his role and will become the Scheduler.

At this stage there was a need to choose some shared resource or a lock service that would become the source of truth for the prupose of making the decision of whether and who is holding the lock. As we understood part of the solution for the distributed lock would require polling for the lock availability by the services to check whether they can acquire the lock and proceed performing scheduler related tasks. Due to the fact that there was already MongoDB database incorporated in the solution and given its good performance benchmarks and its support for atomicity at a document level we decided to use it as a locking. Other option considered was to use MS SQL server stored procedure with update command to provide same kind of atomic locking functionality - but due to the fact that we did not need any relational database specific functionality and given lower performance that relational databases produce compared to No SQL products we decided to stick to MongoDB. Of course potentially other No SQL databases such as Redis could also be used for this role given that they provide strong consistency for the write operations.

One additional reason that MongoDB was chosen for but eventually proved as useless for us was its support for TTL indices - meaning MongoDb would remove a record once the value in a date type field of the document used as the TLL index would expire. It could be used a way to release a lock in case of a current Scheduler crash to allow other clients to get access to the lock. The problem with that was that Mongo does not promise to remove the document the second it expires but at certain point in time based on the internal background data flushing policy. Any delay it the release of the lock was not acceptable by the system design so we had to implement out own Lock release mechanism.

Please note that competing for an exclusive lock on some shared resource is not the only way to provide system wide concensus that ensures that only one of group of peers gets to executing certain action or possess an exclusive role. There are well known other, more elaborate ways to reacha concensus n a system - one such example would be election like method that MongoDb Replica Set itself employs in order to select Master node.
Given the fact that we did not have any requirement for specific service selection logic or prefernce critera we were satisfied with a simple solution that would randomly produce holder for the role of Scheduler.

Detailed look at the code

The code implementation of the solution is based on 2 main interfaces:

One, IExclusiveGlobalMongoBasedLock which basically provides the low level access to the required basic Exclusive Lock related functionality:

public interface IExclusiveGlobalLock
{

    /// Try to get exclusive access to the locked resource
    bool TryGetLock(string clientIdentifier);


    /// Extend the lock the client is currenlty holding
    void ProlongLock(string clientIdentifier);



    /// Release lock held by the client
    void ReleaseLock(string clientIdentifier);


    /// Returns the last tine when lock was acquired or extended
    DateTime? LastAquiredLockTime {get;}


    /// Returns lock duration
    int LockDurationTimeInMilliSeconds { get; }
}

At the core of the solution lies ExclusiveGlobalMongoBasedLock class that is MongoDB specific implementation of IExclusiveGlobalMongoBasedLock interface. It uses MongoClient library to insert into MongoDb ExclusiveLockStorageModel class instance which is getting converted into a BSON document that represents this exclusive lock.

//Try to insert lock record into MongoDB - if no error returned -we got the lock
_collection.Insert(new ExclusiveLockStorageModel()
{
    LockId = _lockId,
    LockAcquireTime = _lastAquiredLockTime.Value,
    LockingProcessId = clientIdentifier
});

In case no error returned the code assumes that the lock was successfully acquired and moves into lock hold maintaining mode.

Given that most of the time developers will want to implement some kind of asynchronous task or thread based solution for the lock polling and lock extension mechanism I also created IExclusiveGlobalLockEngine interface that provides signature for Application facing abstaction layer that would hide the process of lock acquisition .
The interface is implemented in ExclusiveGlobalMongoBasedLockEngine class that is directly created by the Console application serving as a client in the sample code.

   public interface IExclusiveGlobalLockEngine
   {
       /// Start periodic attempts to acquire lock
       void StartCheckingLock(string clientIdentifier, Action onLockAcquired, Action<string> onLockLost);

       /// Stop the process started in StartCheckingLock method and release lock if is currently held by the client
       void StopCheckingOrReleaseLock(string clientIdentifier);
   }
</string>

When using ExclusiveGlobalMongoBasedLockEngine as a way to get the locking functionalty you client code become quite minimal. The main part is number of configuration parameters explained in the attached code comments, unique identifier fo rthe client instance and then two callbacks: one to react on Lock being acquired by the client and the other callback when the lock is lost.

//constructor

        IExclusiveGlobalLockEngine lockEngine = new ExclusiveGlobalMongoBasedLockEngine(
                                                        mongoDbConnectionString, "TestLockDb", "Lock",
                                                        lockDurationInMills, lockCheckFrequency);


 //start asynchronous process of trying to get the lock
        lockEngine.StartCheckingLock(uniqueId, () =>
            {

                Console.WriteLine( "Lock Acquired");
                ....
            }, (reason) =>
            {
                ...
            });

Using the code

In order to run the code open DistributedLocking solution using Ms Visual Studio 2012 or later. Then you need to right click on the solution and select to "Enable nuget package restore". That way, mongocsharpdriver , the only Nuget package that is used in the solution will be downloaded and deployed into the project.

Sample application included in this arcticle requires runnig MongoDB database instance that runs on the local machine on the default 27017 port.
In case you already have it installed on differnt location you can go to Program.cs file in DistributedLockingTestClient porject and modify mongoDb connection string.

In the DistributedLockingTestClient - project that contains the code of the client making the Lock request there is a RunMe.bat file. I would recommend running that file from the bin folder of the project once it is successfully compiled. The batch file would create two instances of the client console application, each one will automatically be given a unique Id and it will be easy to see that one of the client acquires the exclusive lock whereas the other is still trying to get one.
This is what you suppose to see after you run the batch file:

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)