Introduction
GUI applications are event driven, in that they respond to user and hardware events.
These events perform one or more of the following processes:
- Data collection;
- Data transformation;
- Data dispersion.
Collecting data includes such activities as:
- Reading information out of a database;
- Storing a GUI object?s value in a variable;
- Reading a hardware register.
Data transformation consists of any process that manipulates the data, combining, splitting, formatting, etc.
Data dispersion is the opposite of data collection, during which data is written to a database, a GUI object
is updated, or a hardware register is written.
In this paradigm, data is usually very localized, contained within a single function or managed by a single
object. Interface methods of various complexity exist to share data between objects. Ultimately, a large
application must manage a diverse amount of data and perform complicated tasks on this data. This
leads to complexities in managing various aspects of data:
- Lifetime;
- Version / Format;
- Read/Write access.
These tasks are left to the programmer, and are in my experience mostly ignored.
In the Organic Programming Environment (OPEN), a data-centric approach is taken. A Data Pool (D-Pool) manages
all data. Processing functions register themselves with the D-Pool Manager (DPM), indicating the data on which
they operate, and the data that they produce. When data is placed into the D-Pool, the DPM automatically
initiates all interested parties as threads. When all threads have completed, the data is automatically
removed from the D-Pool. The DPM can also be instructed regarding data lifetime, version/format information,
and read/write access.
The Name
I came up with this name because I wanted to describe a process similar to how proteins are transported out of
the nucleus of a cell and manipulated into more interesting molecules, that are then either used by the cell itself
or transported out of the cell, to be used by some other cell or organ. This process seems to me to be inherently
data-centric, with the DNA/RNA being the data, and various processes (enzymes, etc) attaching themselves to the data
as it appears in the cellular fluid. Please note that this architecture is not based on either genetic
programming (http://www.genetic-programming.org/) nor the Organic Programming Language Gaea (http://www.carc.aist.go.jp/gaea/).
Benefits
There are several benefits to this architecture:
- Processes are completely isolated from each other, improving portability and reuse;
- Each process clearly documents the data types in which it is interested;
- The operation of the program is completely traceable without the programmer having to remember to add
trace messages. The DPM can handle all tracing of data and process invocation;
- Since each process is invoked as a thread, the program runs as optimally as possible;
- The programmer is completely alleviated from having to think about whether a thread would improve
performance and doesn't need to consider most issues of data synchronization.
Drawbacks
There are also several drawbacks to this architecture (I'm sure I haven't
thought of them all!)
- Programs become non-linear in their operation, which is difficult to debug;
- Sequences of events may occur in a different order from one run to another, again making the program
difficult to debug;
- Because the datum must be managed by a key, or name, it is possible to have name
collisions? In any
case, even trivial information must be named, which can become a cumbersome task;
- Globalizes a lot of data that doesn't necessarily need to be treated as global;
- Data dependencies, especially when updating / inserting database records, are difficult to support when
the database architecture includes foreign key associations (which any robust database design should
implement) [sorry MySQL!])
- There is no support for this kind of modeling in existing development frameworks such as MFC;
- Existing frameworks, such as MFC, must be significantly enhanced to incorporate this architecture;
- Most importantly, it is a completely foreign way of thinking about programming, and therefore will
probably not be widely considered as practical.
Design
There are four design elements of OPEN:
- The process pool (P-Pool);
- The data pool (D-Pool);
- The data pool manager (DPM);
- The data collection container.
This architecture will quickly create hundreds, if not thousands, of small processes that receive some data,
manipulate it, and place the result back into the D-Pool. Registering all these processes manually becomes a
very cumbersome task, so it behooves us to make this as simple for the programmer as possible.
There process must implement three things:
- Registering the process itself;
- Registering the data on which the process operates.
- Providing the actual implementation of the process.
Each process can operate on more than one datum. If, for example, a process uses two data, the DPM must
check every permutation of datum currently in the pool to determine if a process should be triggered. This
will lead to disaster as the number of datum increases in the D-Pool. To avoid this, a specialized container is
implemented that collects datum. When the collection is complete, the process is triggered. This collection
is implemented as an STL multi-map. A datum can be associated with one or more collections-processes. The
collection acquires the same name as the collection's process, thus associating the collection with its
process. When a datum is placed into the D-Pool, the DPM iterates through the multi-map of datum, adding
the datum to each associated collection. Every process whose collection has a completed datum list is then
triggered.
This design has undesirable side affects:
- A particular datum in a collection can change before the collection is complete;
- Clearing the datum before the process exits can result in throwing away datum already being collected
for a second iteration of this current process.
For this prototype, these side affects are ignored.
Finally, the D-Pool does not know what the data type is, for each datum in the pool. Using templates is not
feasible because templates require the type to be known at compile time, imposing very annoying requirements
on the programmer. Instead, the D-Pool maintains a collection of generic data containers. The data container
is smart enough to convert from and to various built-in types and includes a virtual base class for custom
derivations. Since the data container is not really the topic of this paper, its design and implementation
is ignored. Feel free to browse through the source.
Implementation
To implement this design, all processes must be declared as classes and derived from
the OPEN_Process
class.
class OPEN_Process
{
public:
virtual ~OPEN_Process() {}
void RegisterDNames(void);
bool SetData(const CString& dataName, const DataContainer& dc)
{
dataNameList[dataName]=dc;
return nameList.size() == dataNameList.size();
}
virtual void Run(void)=0;
protected:
OPEN_Process(const CString& s1, const CString& s2) :
pName(s1), dName(s2) {}
OPEN_Process(void) {};
OPEN_Process(const OPEN_Process& p) :
pName(p.pName), dName(p.dName), nameList(p.nameList),
dataNameList(p.dataNameList) {}
protected:
CString pName;
CString dName;
std::vector<CString> nameList;
std::map<CString, DataContainer> dataNameList;
};
This class is implemented as a virtual base class. Other than the constructor (which must be invoked
from a derived class), the only method of real interest to the user is Run(void)
, which must
be implemented in the derived class. Note also that this class encapsulates the process name, the original datum name
list, the datum name list separated out into a vector
, and an STL map
associating the datum
name with the container that maintains the actual value. This information can be used to generate debugging information
or a data diagram (for example, in Visio) of the data flow. Describing output data is currently not a requirement
but could be easily added.
Each process is managed by a process pool. The OPEN_ProcessPool
essentially encapsulates an STL
map
that associates the process name with a pointer to the process. Other objects in OPEN also use
this class to interface to a specific process object.
class OPEN_ProcessPool
{
public:
OPEN_ProcessPool(void) {}
virtual ~OPEN_ProcessPool() {}
void Register(const CString& processName,
class OPEN_Process* proc)
{
processList[processName]=proc;
}
bool SetData(const CString& processName, const CString& dataName,
const DataContainer& dc)
{
ASSERT(processList.find(processName) != processList.end());
bool trigger=processList[processName]->SetData(dataName, dc);
return trigger;
}
void Trigger(const CString& processName)
{
ASSERT(processList.find(processName) != processList.end());
AfxBeginThread(OPEN_ProcessPool::StartProcess,
processList[processName]);
}
protected:
static UINT StartProcess(void*);
public:
static OPEN_ProcessPool pool;
protected:
std::map<CString, class OPEN_Process*> processList;
};
There is no particular reason from the programmer to interface to this class directly.
The OPEN_DataCollection
implements the one to many association between the datum name and the processes
that are interested in that datum. This is implemented as an STL multimap
, and this class is
essentially a wrapper for the multimap
, providing registration and iteration methods.
class OPEN_DataCollection
{
public:
OPEN_DataCollection(void) {}
virtual ~OPEN_DataCollection() {}
void Register(const CString& datumName, const CString& collName)
{
collectionList.insert(std::pair<const CString,
CString>(datumName, collName));
}
bool FindFirst(const CString& dataName, CString& collName)
{
iter=collectionList.find(dataName);
ASSERT(iter != collectionList.end());
collName=(*iter).second;
return iter != collectionList.end();
}
bool FindNext(const CString& dataName, CString& collName)
{
++iter;
if (iter==collectionList.end())
{
return false;
}
collName=(*iter).second;
return (*iter).first == dataName;
}
public:
static OPEN_DataCollection coll;
protected:
std::multimap<const CString, CString> collectionList;
std::multimap<const CString, CString>::iterator iter;
};
The OPEN_DataPool
implements a wrapper around yet another STL map
. This map associates
the datum name with the actual value. It is this object with which the application interfaces to place data into
the D-Pool. This class also implements a semaphore that unblocks the DPM, which then parses through the data in
the D-Pool, removing it and placing it into the corresponding process container. This class also implements a
CRITICAL_SECTION
to ensure that the DPM (running as a thread) can read and write the collectionList
without colliding with other threads that are potentially writing and erasing this collection also.
class OPEN_DataPool
{
public:
OPEN_DataPool(void)
{
InitializeCriticalSection(&cs);
dpSem=CreateSemaphore(NULL, 0, 0x7FFF, "OPEN_DP_SEM");
ASSERT(dpSem);
}
virtual ~OPEN_DataPool()
{
DeleteCriticalSection(&cs);
CloseHandle(dpSem);
}
void Add(const CString& dataName, const DataContainer& dc)
{
EnterCriticalSection(&cs);
dataPoolList[dataName]=dc;
LeaveCriticalSection(&cs);
ReleaseSemaphore(dpSem, 1, NULL);
}
void RemoveDatum(CString& s, DataContainer& d)
{
EnterCriticalSection(&cs);
std::map<CString, DataContainer>::iterator iter
= dataPoolList.begin();
ASSERT(iter != dataPoolList.end());
s=(*iter).first;
d=(*iter).second;
dataPoolList.erase(iter);
LeaveCriticalSection(&cs);
}
public:
static OPEN_DataPool pool;
protected:
std::map<CString, DataContainer> dataPoolList;
CRITICAL_SECTION cs;
HANDLE dpSem;
};
The OPEN_Mgr
implements the DPM. This class is very simple:
class OPEN_Mgr
{
public:
OPEN_Mgr(void) {}
virtual ~OPEN_Mgr() {}
void Run(void);
public:
static OPEN_Mgr mgr;
protected:
HANDLE dpSem;
};
What is of more interest is the implementation of the Run
method:
void OPEN_Mgr::Run(void)
{
dpSem=OpenSemaphore(SYNCHRONIZE, FALSE, "OPEN_DP_SEM");
ASSERT(dpSem);
while (1)
{
DWORD ret=WaitForSingleObject(dpSem, INFINITE);
if (ret==WAIT_OBJECT_0)
{
CString dataName;
CString processName;
DataContainer data;
OPEN_DataPool::pool.RemoveDatum(dataName, data);
bool ret=OPEN_DataCollection::coll.FindFirst(dataName,
processName);
while (ret)
{
bool trigger=OPEN_ProcessPool::pool.SetData(processName,
dataName, data);
if (trigger)
{
OPEN_ProcessPool::pool.Trigger(processName);
}
ret=OPEN_DataCollection::coll.FindNext(dataName,
processName);
}
}
else
{
break;
}
}
}
Implemented as a thread, this function waits for datum to be placed into the data pool, upon which the thread is
released. It iterates through the data pool, removing each datum and value from the pool and storing it in the
each process that is interested in the datum. The DPM then "triggers" the process as a thread when all the datum
that the process requires has been instantiated.
To support a somewhat more readable implementation of processes, several macros are defined:
#define DECLARE_OPEN(x, y) \
class x : public OPEN_Process \
{ \
public: \
x(void) : OPEN_Process(#x, y) {} \
virtual void Register(void) \
{ \
RegisterDNames(); \
OPEN_ProcessPool::pool.Register(#x, this); \
} \
virtual void Run(void); \
static x _##x; \
}; \
x x::_##x;
#define IMPLEMENT_OPEN(x) \
void x::Run(void) {
#define FINISH_OPEN \
dataNameList.erase(dataNameList.begin(), dataNameList.end()); }
#define REGISTER_OPEN(x) \
x::_##x.Register()
Thus, the implementation of a process would look something like this:
DECLARE_OPEN(AddCost, "itemCost");
IMPLEMENT_OPEN(AddCost)
{
double cost;
dataNameList["itemCost"].Get(cost);
double total=atof(dlg->total)+cost;
OPEN_DataPool::pool.Add("totalCost",
DataContainer().Set(AutoString(total)));
}
FINISH_OPEN
And somewhere in the initialization section of the application, the process must be instantiated:
REGISTER_OPEN(AddToList);
The issue of process instantiation is an annoying one. You will notice that almost all of the OPEN classes automatically
instantiate a singleton, implemented as a public static member of each class. This is also done with the processes.
However, the actual registration of the datum names cannot be done at program startup because other necessary initialization
(for STL, for example) has not occurred. To my knowledge, there is no way of pre-determining the initialization order of
global or static data.
The Demonstration Program
The demonstration program is a simple and rather poor example of how this paradigm works. The "Add Part", "Remove Part",
and "Clear Part" events each place data into the D-Pool which is picked up by the DPM. The most interesting thing
about this example is that the datum "itemCost" triggers two events: one to update the list box and the other to
update the running total. In actuality, this example is a poor one because the process handlers are so tightly
coupled to the dialog object, which is definitely not a desirable thing to have happen in real life. Also note that
I over-rode the AssertValid
method so that I could update the dialog from a worker thread while still
running in debug mode. Good 'ol MFC.
A More Interesting Thought Experiment
For this model to really be effective, the developer needs to completely rethink how applications are designed
and implemented.
For example, a function might do something like this:
- DB query Q1
- DB query Q2
- Operation A
- DB query Q3
- DB query Q4
- Operation B
- DB update QQ
- DB update RR
In the spirit of this architecture, the function should be recoded into several processes:
Process 1:
DB query Q1
Process 2:
DB query Q2
Process 3: (depends on Q1 and Q2)
Operation A
- Output result R1 into Data Pool
Process 4: (depends on R1)
DB query Q3
Process 5: (depends on R1)
DB query Q4
Process 6: (depends on Q3 and Q4)
Operation B
- Output result R2 into the Data Pool
- Output result R3 into the Data Pool
Process 7: (depends on R2)
DB update QQ
Process 8: (depends on R3)
DB update RR
As you can see from the above architecture, the program now automatically performs the database queries and
updates simultaneously in separate threads. This can dramatically improve program performance and it is achieved by simply using a different paradigm for data management.
The Challenge
I consider this paradigm a significant enhancement to the existing process-centric programming styles.
It results in:
- smaller and simpler functions;
- greatly increases parallel processing;
- automatically provides detailed function and data tracing;
- and documents all function inputs and outputs.
There are complexities in this model that are not fully understood. My challenge to the reader is to
identify these complexities and design solutions for them, ultimately making this paradigm robust and easy
to use. For example, in this prototype implementation, the DPM, D-Pool, and other objects are implemented
as global singletons. It seems instead more reasonable that an application would have several data pools at
different scales. This would extend the entire concept of organic programming?
For example,
(pardon the analogy), program organs.
Credits
Please credit the author, Marc Clifton, for the core of OPEN in any application that you build using it (I'm probably dreaming, right?). The
author (me) also requests that he is provided with the source code and list of all enhancements that you make to the architecture of
OPEN, so that they may be included in future versions for the benefit of all.
Conclusion
The Organic Programming Environment is not a replacement for process-centric modeling. However, the OPEN is
a significant enhancement in the programmer's toolset because there are many cases where a data-centric model
is superior to a process-centric one.