Introduction
Software developers are starting to wake up to the notion that concurrency and parallel processing are becoming more important as the industry matures. I recently studied the Actor Programming Model in a
little
bit of detail and am intrigued by its simplicity and robustness.
The Actor Programming model is a concurrent
programming technique which provides a powerful mechanism for encapsulating
several concurrency features.
Building code based on this model allows implementation of some advanced
concurrent constructs with fairly simple code.
A developer not well
experienced in parallel coding should be able to write concurrent code
following only a few rules to prevent the common problems of race conditions and non-deterministic behavior.
This article explores a few uses of Actors using Win32
in
C++.
The code examples provided were
developed and tested with Visual Studio 2010, but I would expect them to
work
with older and newer versions of the compiler and in a 64 bit environment.
The recent C++ 2011 language standard has
added some new threading syntax to the language (implemented starting with
Visual Studio 2012) which provide some builtin C++ features that would allow
alternate implementation of some of the threading details. The code is Win32 but the underlying principles are universal, applicable to many languages and platforms.
Background
In general terms, the Actor concept was
developed and refined about 1973(1), and developed on a platform
of multiple
independent processors in a network.
Implementation
on a multiprocessor machine provides several basic concurrency features
including encapsulation of parallel synchronization and serialized message
processing, which allow higher level concurrent features such as fork/join,
async/await, pipeline processing and others.
The actor code
encapsulates the threading and synchronization management so that a class
derived from it can use threading techniques without having to implement the
low level plumbing details.
What Is It?
In simplest terms, an actor is an object which has the
following characteristics:
- It is an autonomous,
interacting component of a
parallel system comprising an execution control context (ie. A process,
thread
or fiber), an externally accessible address, mutable local state, and APIs
to
manipulate and observe the state.(2)
Each actor's state is unique and is not shared with other objects.
- The processing states
consist of: Created,
running, stopped, with substates determined by the programmer.
For all processing states, external code can
look at the actor's internal details and retrieve state information about
the
actor, as allowed or prohibited by the actor.
The actor's lifetime proceeds from created, to running, to stopped.
Once stopped, it does not restart.(3)
- It has APIs to
start processing, and manage a
synchronized message queue, from which it receives requests for action from
the
enclosing program (including itself or other actors).
When the actor is created, the queue can
accept messages, but they are not processed.
When running, the actor processes the messages sequentially and
atomically, one message at a time.
Pending messages are queued in the message queue.
When stopped, messages are ignored.
Messages sent from multiple execution
contexts to the actor are not guaranteed to arrive in temporal order,
although
multiple messages from the same source will arrive in chronological
order.(4)
- The actor is created
externally and started with
an external API call.
It is stopped by
sending a 'stop' request through the message queue, which the actor responds
to
by cleaning up and terminating itself.
When running, an actor can process a finite number of messages, send
messages to itself
or other actors, change local state and create/control/terminate a finite
number of other actor objects.
Besides creation and starting, the local state mutates only when processing
a message.
- An actor is a passive and
lazy object.
It will not respond or execute unless a
message is sent to it via the message queue.
- Parallelism can be
observed with multiple actors
processing messages concurrently.
The examples created here will
consider actors as a C++ "framework" base class containing basic
functionality and one or
more derived classes containing some required plumbing and desired behavior
provided by the programmer. For the actor representation described above,
see the diagram in Figure
1.
Figure 1. Representation of the base class in the actor programming model
Once created, the actor base class has 2 provided
public
methods Start()
and Send()
which start the actor
and send messages to the message
queue, plus a protected method Process()
to implement payload
behavior and Exit()
to
terminate.
The Process()
method is pure
virtual and must be implemented in the derived class.
The base class encapsulates message handling
and the creation, deletion and management of a thread.
The derived class (this would be provided by
you) must implement the main intended actor behavior and activation of
termination
handling in the Process()
method.
This is a minimal description; of course, other
details can be implemented at the discretion of the programmer, for example
to
retrieve actor state or internal data. Note that the actor is not
(gracefully) stopped
directly from an external API call but asynchronously shuts itself down and
exits when processing a prearranged shutdown message through the message
queue.
Ensuring this behavior is the
responsibility of the derived class and calling code.
An extract of the actor base class is:
class HBActor
{
public:
HBActor();
virtual ~HBActor();
public:
virtual void Send(BaseMessage* message);
virtual void Start();
protected:
virtual void Process(BaseMessage* ) = 0;
};
By implementing some fairly simple classes, we can
build on
this to create a fairly complex framework with little effort.
Actors can interact in a network, comprising
well known architectures such as a fork/join structure, a pipeline or shared
work
queue.
Let's look at a fork/join
example.
Fork/Join
A fork/join solver(5) is briefly summarized as
follows:
Result solve(Problem problem)
{
if (problem is small)
directly solve problem
else {
split problem into independent parts
fork new subtasks to solve each part
join all subtasks
compose result from subresults
}
}
Suppose that we have some code with two orthogonal
pieces
that could execute in parallel.
This can
be implemented fairly simply with actors using a simplistic fork/join
arrangement.
Code illustrating this
would look like:
void foo()
{
LongProcess1();
LongProcess2();
}
void MyCode()
{
foo();
}
If we implement at least one of these as an actor
object,
method foo() can be rewritten as a trivial fork/join with the two pieces for
some speedup.
To do this, of course the
two functions need to be truly orthogonal to each other with no shared data
to
avoid race conditions. LongProcess2()
also must be void (or the return value ignored) since it operates autonomously and we can't get a return value back from it.
This would look like:
typedef enum { DOTASK, STOPACTOR } MsgType_t;
class Msg : public BaseMessage
{
public:
Msg(int iValue)
{ m_iValue = iValue; }
virtual ~Msg(){}
int GetValue() const
{ return m_iValue; }
private:
int m_iValue;
};
class MyActor : public HBActor
{
public:
MyActor() {}
virtual ~MyActor() {}
protected:
virtual void Process(BaseMessage* pBMsg)
{
Msg*pMsg = (Msg*) pBMsg;
if(STOPACTOR == pMsg->GetValue()) Exit();
else LongProcess2();
delete pBMsg;
}
};
void foo()
{
MyActor actor;
actor.Start();
actor.Send(new Msg(DOTASK));
actor.Send(new Msg(STOPACTOR));
LongProcess1();
actor.Join();
}
void MyCode()
{
foo();
}
This is a simplistic configuration for the
fork/join model.
Ask Google for some more involved
code examples.
The actor object activates a thread as part of its
startup
code, which can consume resources and take some time.
Creation of a thread is not free or
instantaneous.
Be sure that the two APIs
LongProcess1()
and LongProcess2()
are indeed "long" compared to thread
creation,
or you will be wasting your time with this implementation.
Another example calculating a list of primes using a pipeline of actors is
included in the
sample code.
Limitations of the Actor Programming Model
To be complete, here are some of the realities of
the Actor
model:
- Use of actors reduces
mechanisms for race
conditions but does not eliminate them.
Data
race conditions are possible if the messages or underlying logic touched by
the
actor objects includes mutable shared objects.
Implementation of truly concurrent data structures is non-trivial.
The actor model improves on some of these
issues, but does not solve all of the problems.
- Deadlocks are possible
under a number of
situations.
- The Actor model
implements message passing in
the direction of the actor, but does not facilitate sending a request and
receiving a specific status or a reply to a request. Synchronous replies
require some sort of blocking logic.
For information on objects which can provide this behavior, look at
"futures".
Footnotes
- Source: http://dl.acm.org/citation.cfm?id=1624804
- Actors can actually execute in a computer
network or as multiple processes in separate address spaces.
In this article, I consider actors on a
single machine in one address space with multiple threads.
- There is also some work describing a "pause" and
"resume" feature which is not considered here.
- Some reference information does not guarantee
this detail.
For purposes of this
article, it will be assumed to be the case.
- Source:
http://gee.cs.oswego.edu/dl/papers/fj.pdf
Points of Interest
I have seen various comments on the
Actor Programming Model on the web, including some detractors. I have
been very happy with the functionality this model presents. I hope you
enjoy it!
In my coding travels, I have used the actor model as a logging class, a buffered I/O handler (both input and output) and as an iterative problem solver. I love it!
As stated above, this code was developed with VS2010 on
Win32. I am interested in validation of the code with other compilers
and other platforms. Perhaps leave comments below if you have used it with success on another platform.
History
- 2012/10/20 Initial version.
- 2012/10/27 Reinstate lost bullet points, add missing Join() call, fix footnote reference.