Introduction
Windows Communication Foundation (WCF) is a topic that no longer warrants an introduction. The technology provides distributed computing while providing protocol independence. WCF has been extended to provide Restfull API’s in recent release.
In this article, we will discuss how to provide parallel and distributed computing capabilities using WCF. We will also discuss what kind of problems can be solved using the suggested techniques.
Objective
The objective of this article is to discuss architecture, design and sample implementation of distributed and parallel computing.
Scope
This article will provide with logical architecture details, design details and a code discussion of a sample implementation of a distributed and parallel computing using WCF.
Audience
The audience of this article may vary from solution architects, technical designers and developers. It is assumed that individuals have a working knowledge of WCF, parallel and distributed programming, async method invocation, creation of Windows services, WCF deployment, etc.
Flow of the Discussions
In this article, we will start with defining problems which qualify for distributed and parallel computing. We then describe a sample problem that we will use for the remainder of the discussion. This is followed by a description of the approach used. We start with logical architecture of the building blocks and map it back to the approach. We then detail out the design of the application in the light of our problem statement. A sample implementation is provided as a part of the discussion.
What Kind of Problems Will Qualify for Distributed Parallel Processing?
In this section, we will try to identify what problems can qualify for distributed and parallel processing. As you would have identified, we are looking at two sets of problems distributed and parallel.
Any computational problem whose results may be processed outside the application domain may qualify for distributed computing. Many instances desire that the processing may be distributed, this may be influenced by security, scalability or availability of resources required for computing.
A problem in which we can find repeated patterns of identical sub problems may be used for parallel processing. This is something we look for when we are writing a multithreaded application.
Based upon the two problem sets, we can safely assume that any problem which satisfies the requirements of the two domains may be qualified for distributed parallel processing. It is usually prudent that the overhead of distributed and parallel processing should be taken if the resource utilization of the problem at hand is very high.
There exist many examples of scientific problems which may use distributed computing such as weather forecasting, however in business domains tasks like EDI, batch processing (say salary calculations) are contenders of distributed and parallel computing.
A Sample Problem For Our Discussion
In this article, to keep our discussion concrete and tangible, we will describe a sample problem. This is an overly simplified problem statement to ensure that we keep our focus on the technical problems rather than the business logic at hand.
Let us assume that we get a file from a source which contains many lines. Each line contains a string message and a number x. It is our job to make entries in the database for the file name once and message x number of times. Once the entire file is parsed, we should mark the entries as final. If any exception or error takes place, we should mark our entries as error.
A sample of such a file having name Sample.txt.
Message1,5
Messag2,2
….
We will create one entry for Sample.txt, five entries for Message1, two for Messag2 and so on….
How to Solve this Problem using Distributed and Parallel Computing
We will not discuss how to logically go about solving a problem using principles of distributed and parallel computing. We will introduce some of the terms on which we will build on in the remainder of the article.
Once we have identified that a problem may be solved using distributed and parallel techniques, we identify two basic components the Master whose job is to find a quantum of work and then distribute the same to one of the workers. The master is also responsible for overall transaction management.
The job of the worker is to process each job as an independent task without worrying about the other parallel tasks.
In our sample problem, the master will parse the file and once it finds a line it will send it to one of the workers.
How To Do It with WCF
We will now discuss how to solve the problem using WCF. We will build two separate WCF services, one called the Master and other called worker. The master will be invoked by parsing a file name. The master will start reading the file one line at a time and send the read string to the worker in an async manner. The worker will write the message to the database.
To ensure we take some time in doing the task, we will write the message in a loop with some delay.
Architecture
We now focus upon the architecture which will be used to realize the suggested approach. The architecture has been illustrated in the diagram given below:
A client machine initiates a request to process a file. The mechanism about how the client comes to know that a file is available for processing is immaterial. The WCF service must have access rights to the requested file.
The application has two WCF services, the master and the worker. The job of the master is to expose an end point using which requests are initiated.
The Master WCF service has a Quantum Identifier component; this is responsible for identifying independent quantum of task. In our application, this will be the module which will read the file line by line and send each line for processing.
The Job distributor is responsible for sending the task to a worker in an async manner. The Job distributor may or may not expect a result from the worker which leads to a decision of async or OneWay
method.
The worker has a WCF endpoint which is used by the Master for requesting a quantum task for processing. The Job Performer is used for the component which executes the tasks.
In an ideal scenario, many identical workers are deployed behind a load balancing cluster.
Design
We will now dive deep into the control flow for the distributed processing. We will discuss the high level flow control using a sequence diagram.
A client calls the init
method on the Master implemented as a WCF service. In the init
method, the file name is passed. The interface calls the quantum identifier by calling the ParseFile
method. ParseFile
reads the file line by line and for each read line, calls the RequestExecution
of the JobDistributor
.
The RequestExecution
takes a string
line as a parameter and calls the Worker WCF SaeEntries in async mode and increases the number of active threads. This allows a non blocking call in a separate execution space allowing distributed processing. The interface calls the SaveEntries
on JobPerformer
.
Once the execution is completed, the results are sent back to the JobDistributor
where based upon the processing results the overall execution status may be found. Once the results are received, the number of threads is decreased.
It is to be noted that Master is a WCF service, thus after the complete file has been parsed the instance of the master will be garbage collected. It is hence required that once the file has been completely parsed, then we start a loop to keep the instance of the Master alive. Once the threads are zero the loop is stopped so that the Master may be garbage collected.
The overall execution status is maintained by the JobDistributor
which is updated by the async results of each request.
Sample Implementation
We are finally in the last section of this article. The available source code is an overly simplified sample implementation which concentrated on how to get a distributed and parallel processing application running. It lacks many best practices such as error/exception handling, encapsulation of logic, etc. The idea of the sample application is to ensure that the ideas discussed are successfully demonstrated. The sample code has been heavily commented with details around why a line of code exists rather than what the line of code is doing.
Final Words
I would appreciate any feedback or comments to improve this article. Please send me your comments at Gaurav.Verma.MCA@gmail.com