Introduction
In software, a processing "pipeline" is one of the few examples of a real, physical concept having a useful counterpart in software. Another popular example is thinking of software in terms of "objects." One of the advantages of these types of concepts is that because we understand their physical counterparts, we know how to use and can easily grasp them. Although there is certainly more to object-oriented programming than understanding the concept of a physical object, a pipeline really is just about that simple.
A pipeline, as defined here, is simply a collection of pipe segments connected together in different ways, according to pre-specified rules, in order to accomplish a task. In its simplest form, a pipeline only requires a head and a tail, and they can be one and the same (although this typically defeats the purpose).
A software pipeline can be used in a lot of ways. Some examples I've seen are: performing long scientific algorithms, where each step is well-delineated; performing abstracted device reads/writes, as in the case of buffered I/O (network, harddrive, etc.); and in graphical processing.
Background
As a big fan of the interface-as-a-contract school of thought, when I sat down to design the pipeline, I wanted to establish a simple means by which to connect two objects, while also affording interface-level freedom throughout. Connecting one object to another object in this context - via the object1.connectTo(&object2)
method, or with the overloaded object1 += object2
operator - is synonymous with telling one object(object1
) to send its output to another (object2
).
In the event that you are unfamiliar with the interface-as-a-contract paradigm, I'll give you my brief synopsis. It says that one of the things you consider when you start a software development project is interface definition. Define interfaces in areas that you believe will be changed/expanded - in other words (not mine), "encapsulate the variation." If you incorporate good interfaces into your software design, you will reap the benefits many times over. This is particularly useful in a pipelined design, where it's typically prudent to define an interface at each connection-point. Then, users can implement a particular interface (agree to a contract) they are interested in working on and connect it right into the pipeline seamlessly.
Using the Code
This framework provides a single class that defines a segment in the pipe, and can determine, at compile time, whether or not it can successfully connect to another pipe segment. The user can specify a class, base-class, or interface (abstract
class), that defines the types of classes that his/her pipe segment(s) can connect to. A pipe segments may be simultaneously connected to other pipe segments, as well as having other pipe segments simultaneously connected to it.
Since the demo application provided is about the most boring application ever written, and I am not interested in praise for this article on the grounds of its ability to cure insomnia, I'll show how to use the framework to construct a marginally more interesting, purely hypothetical, HTTP server request processing pipeline. Please take note of the fact that there will not be any implementation of a server here (this code will not compile), just the following grossly over simplified four step pipeline:
- Receive a request and start the pipeline
- Authenticate the request
- Authorize the request
- Respond to the request
Our HTTP server processing pipeline will contain a pipe segment for each of these steps:
HTTPRequestHandler
- Responds to an incoming HTTPRequest
object by simply sending it off down the pipeline. HTTPRequestAuthenticator
- Accepts an HTTPRequest
object, authenticates it, and passes it down the pipeline (if appropriate). HTTPRequestAuthorizer
- Accepts an HTTPRequest
, authorizes it, and passes it down the pipeline (if appropriate). HTTPRequestResponder
- Accepts an HTTPRequest
, and responds to it in the appropriate manner.
The terrific thing about this example is that, in order to accomplish this enormous feat of coding prowress, we will only need to define one interface, which each of the above classes will implement:
- An
IHTTPRequestHandler
interface, defining a single method called HandleHTTPRequest(HTTPRequest *theRequest)
I'm not going to discuss the innards of the HTTPRequest
class, I'm just going to assume it already exists, nor will I wax eloquent on the proper methodologies for doing HTTP request authentication or anything similar. Please, do not write to me to let me know this is not the right way to code an HTTP server. I'm sure there's a lot more to it.
What we're working towards here is for the following code to be executed prior to the first request handled by the HTTP server. This is where we're building the pipeline. I realize this would probably cause some scoping problems if implemented cut-n-paste from here - again, this is only an example to give you a feel for things. I'll let you figure out how to get around scoping/variable lifetime issues:
...
HTTPRequestHandler theRequestHandler;
HTTPRequestAuthenticator theAuthenticator;
HTTPRequestAuthorizer theAuthorizer;
HTTPRequestResponder theResponder;
theRequestHandler += theAuthenticator;
theAuthenticator += theAuthorizer;
theAuthorizer += theResponder;
theRequestHandler.connectTo(theAuthenticator.connectTo
(theAuthorizer.connectTo(theResponder)));
...
Hopefully, that's pretty self-explanatory.
Now, how do we get there? Well first we'll write the IHTTPRequestHandler
interface and define it something like this:
#include "HTTPRequest.h"
#include "PipeSegmentBaseAdapter.h"
class IHTTPRequestHandler : public PipeLineProcessing::PipeSegmentBaseAdapter
{
public:
virtual void HandleHTTPRequest(HTTPRequest *request)=0;
};
With that interface defined, we can start writing the pipe segment objects. They might look like this:
#include "HTTPRequest.h"
#include "PipeSegment.h"
class HTTPRequestHandler :
public IHTTPRequestHandler, public PipeLineProcessing::PipeSegment<IHTTPRequestHandler>
{
public:
virtual void HandleHTTPRequest(HTTPRequest *request) {
for (int i=0; i < (int)this->theOutput.size(); i++) {
IHTTPRequestHandler *anOutputHandler =
(IHTTPRequestHandler *)this->theOutput.at(i);
anOutputHandler->HandleHTTPRequest(request);
}
};
};
class HTTPRequestAuthenticator :
public IHTTPRequestHandler, public PipeLineProcessing::PipeSegment<IHTTPRequestHandler>
{
private:
bool requestIsAuthentic(HTTPRequest *request)
{ return true; };
public:
virtual void HandleHTTPRequest(HTTPRequest *request) {
if (requestIsAuthentic(request)) {
for (int i=0; i < (int)this->theOutput.size(); i++) {
IHTTPRequestHandler *anOutputHandler =
(IHTTPRequestHandler *)this->theOutput.at(i);
anOutputHandler->HandleHTTPRequest(request);
}
}
}
};
Above we've defined the first two pipeline segments. Hopefully they are readable enough that you can see they both have the same hierarchy tree. In pipeline terms, they both output to and serve as input for IHTTPRequestHandler
type objects. The convention here is that the first type listed in the inheritance definition specifies the interface you are implementing, if any. Secondly, you use the PipeSegment<ihttprequesthandler />
to specify which type of objects you will output to. If you haven't been sleeping well the last few nights and/or you have deep questions at this point, look at the code in the demo app supplied. It should clear things up -- or put you to sleep, whichever your ailment.
Please also notice the for
loop that is in each class. This will be discussed in more detail later.
So, the other two classes will look similar -- maybe like this:
class HTTPRequestAuthorizer :
public IHTTPRequestHandler, public PipeLineProcessing::PipeSegment<IHTTPRequestHandler>
{
private:
bool requestIsAuthorized(HTTPRequest *request)
{ return true; };
public:
virtual void HandleHTTPRequest(HTTPRequest *request) {
if (requestIsAuthorized(request)) {
for (int i=0; i < (int)this->theOutput.size(); i++) {
IHTTPRequestHandler *anOutputHandler =
(IHTTPRequestHandler *)this->theOutput.at(i);
anOutputHandler->HandleHTTPRequest(request);
}
}
}
};
class HTTPRequestResponder :
public IHTTPRequestHandler, public PipeLineProcessing::IPipeTail
{
private:
void respondToRequest(HTTPRequest *request)
{ };
public:
virtual void HandleHTTPRequest(HTTPRequest *request) {
respondToRequest(request);
}
};
And that's it! Now you can execute the code we first started with, and send the HTTPRequest
to the HTTPRequestHandler
. The HTTPRequest
object will be authenticated, authorized and responded to via our sweet little pipeline -- all automatically!
The only remaining question is, "what's with the for
loop?" Well, this was the one evil part of the whole thing. I could not devise a means by which to automatically determine and invoke which method of the output objects to call without complicating things enormously, adding extra library dependencies, and/or possibly using functors/delegates, with which I am not yet too friendly. I didn't want to do that, so I kept things simple and levied the requirement on the user to actually spell out when he/she wants to send the output down the rest of the pipeline.
The for
loop iterates over the theOutput
STL vector inherited from the PipeLineProcessing::PipeSegmentBase
base class. This vector stores pointers to all of the pipe segment objects this pipe segment connects to (that is, sends its output to). As it iterates over these output handlers, it downcasts them to pointers of the type specified in the class' definition (as this type of pipe segment's template parameter). By calling the desired method on that type, the pipeline is continued.
The astute programmer should've woken up when he/she saw the word "downcast" since this is considered an unsafe, dangerous casting. Unlike an upcast which casts a derived object as a base class type, and is therefore always safe, a downcast casts a base class as a derived class. This is dangerous because, in general, one never knows if the object being executed at run-time will be of the derived type, whereas one always knows that a derived type can be treated as a base type. To use a physical example, all soccer balls are balls(upcast), but not all balls are soccer balls(downcast).
So, how can we safely downcast a ball to a soccer ball? Only if we are certain that it is one. In this framework, that certainty is accomplished through the methods that add objects to the theOutput
vector. While they are generic internally, the only methods exposed to clients to facilitate connections, the +=
operator and the connectTo
method, only accept the appropriate types of objects. In other words, we let the compiler do our checking; unless the object being added on to the end of the pipe is, or can safely be addressed as (that is, upcast to), the type specified in the output defining portion of this class's definition, it will be flagged at compile time. This is one of the cooler tricks in this little project, and it is mostly thanks to the Kevlin Henney trick I found. I show examples of this in my boring demo.
This is my first submission to CodeProject, so please give it a try and leave comments.
Interesting Links
- VTK - The Visual Toolkit, by Kitware. An open-source 3D graphics/visualization framework that is built on the concept of a graphics pipeline. I'm not sure if they use a design similar to the one I present here, as I have not looked at the code. But I have used the toolkit, and I hope they used some mechanism like this one to do it!
- A Generic Data Process Pipeline Library. A totally different take on a processing pipeline in C++. Largely macro based and useful if you have the exact same pipeline multiple times in your application.
- Cool Template Trick. This is a great little trick provided by Kevlin Henney. Use it anytime you want to require at compile time that a template parameter be of a particular base-type.
A Few Notes on the Demo
- I am hoping for the prestigious Code Project's Most Boring Demo Application award with this submission. Please do not download the demo app thinking that it is going to do something wild and crazy, or you will be disappointed. It really only serves as a starting point for someone looking to build their pipeline with this framework.
- Along with the demo, I have provided VS 2003 .NET solution and project files to ease building it in that environment.
- Additionally, I have included .cdtbuild, .cdtproject and .project files for building in the Eclipse CDT -- my IDE of choice.
- Since the framework is entirely header based (the .cpps contain only documentation), all you have to do is put them somewhere and make them accessible to your tool-chain. In this case, the project files I have included are set to look under the same root directory that contains the demo project's directory for a directory called "Pipeline". For example, if the "PipelineDemo" directory is under "projects" then the demo project will build correctly if it finds the "Pipeline" directory under "projects" as well. Otherwise, you will have to point it to the "Pipeline" folder, wherever that might be on your system.
License
G & A Technical Software (GATS) has released this code to the public domain. This means that you are allowed to take this code and use it with almost no legal obligations whatsoever. No copy left nonsense, no requirement that you keep the header in-tact, nothing. The only stipulation, in fact, is that by using the software you release GATS from any and all liability -- neither of us will be held responsible for any difficulties that might arise from you or your company's use of the code. For more details, read the header on any file in the source.
Future
Like all open source software, this framework could be extended (hopefully easily) to provide even more power to the developer using it. Some of my ideas are:
- Multi-Directional/Multi-Channel Pipeline - oftentimes in a pipeline it is intuitive to have a two-way connection, one going conceptually "downstream" the other going "upstream." Currently, two disassociated pipelines must be maintained in order to accomplish this. It would be sweet if both of them could be defined in a single pipeline.
- Functors/Delegates - a way to avoid the aforementioned
for
loop would be sweet. - Built-in Control Signals - a generic mechanism for basic control and analysis of the pipeline would be beneficial and useful to many potential users.
- Multi-Threaded Support - in an assembly line processing pipeline, each step in the process is working on its portion of the process concurrently. This would be a major accomplishment/improvement here if each pipe segment was launched in its own thread.
- Better Support for Error Handling and Exceptions - currently, there is no facility whatsoever for handling errors in the processing pipeline. Having one of these would definitely be advantageous.
- A Main
Pipeline
Type - giving the user of the framework the ability to define an entire pipeline in one statement, similar to the way they did it in the borland article mentioned above would be a pretty sweet addition - and it might make some of these other improvements/additions/features a little easier to accomplish.
Obviously, if you make any improvements that you would like to share, send them to me, and if I think they're good, I'll add them here for sure.
History
- Original post: September 23rd, 2006