(untagged)

Organic Programming Environment (OPEN)

Marc Clifton

0.00/5 (No votes)

26 May 2002

OPEN is a prototype development exploring a different paradigm for data management. Instead of applications being process-centric, in which processes drive data transfer, the Organic Programming environment uses a data-centric approach. In this paradigm, data initiates processes.

Introduction

GUI applications are event driven, in that they respond to user and hardware events. These events perform one or more of the following processes:

Data collection;
Data transformation;
Data dispersion.

Collecting data includes such activities as:

Reading information out of a database;
Storing a GUI object?s value in a variable;
Reading a hardware register.

Data transformation consists of any process that manipulates the data, combining, splitting, formatting, etc. Data dispersion is the opposite of data collection, during which data is written to a database, a GUI object is updated, or a hardware register is written.

In this paradigm, data is usually very localized, contained within a single function or managed by a single object. Interface methods of various complexity exist to share data between objects. Ultimately, a large application must manage a diverse amount of data and perform complicated tasks on this data. This leads to complexities in managing various aspects of data:

Lifetime;
Version / Format;
Read/Write access.

These tasks are left to the programmer, and are in my experience mostly ignored.

In the Organic Programming Environment (OPEN), a data-centric approach is taken. A Data Pool (D-Pool) manages all data. Processing functions register themselves with the D-Pool Manager (DPM), indicating the data on which they operate, and the data that they produce. When data is placed into the D-Pool, the DPM automatically initiates all interested parties as threads. When all threads have completed, the data is automatically removed from the D-Pool. The DPM can also be instructed regarding data lifetime, version/format information, and read/write access.

The Name

I came up with this name because I wanted to describe a process similar to how proteins are transported out of the nucleus of a cell and manipulated into more interesting molecules, that are then either used by the cell itself or transported out of the cell, to be used by some other cell or organ. This process seems to me to be inherently data-centric, with the DNA/RNA being the data, and various processes (enzymes, etc) attaching themselves to the data as it appears in the cellular fluid. Please note that this architecture is not based on either genetic programming (http://www.genetic-programming.org/) nor the Organic Programming Language Gaea (http://www.carc.aist.go.jp/gaea/).

Benefits

There are several benefits to this architecture:

Processes are completely isolated from each other, improving portability and reuse;
Each process clearly documents the data types in which it is interested;
The operation of the program is completely traceable without the programmer having to remember to add trace messages. The DPM can handle all tracing of data and process invocation;
Since each process is invoked as a thread, the program runs as optimally as possible;
The programmer is completely alleviated from having to think about whether a thread would improve performance and doesn't need to consider most issues of data synchronization.

Drawbacks

There are also several drawbacks to this architecture (I'm sure I haven't thought of them all!)

Programs become non-linear in their operation, which is difficult to debug;
Sequences of events may occur in a different order from one run to another, again making the program difficult to debug;
Because the datum must be managed by a key, or name, it is possible to have name collisions? In any case, even trivial information must be named, which can become a cumbersome task;
Globalizes a lot of data that doesn't necessarily need to be treated as global;
Data dependencies, especially when updating / inserting database records, are difficult to support when the database architecture includes foreign key associations (which any robust database design should implement) [sorry MySQL!])
There is no support for this kind of modeling in existing development frameworks such as MFC;
Existing frameworks, such as MFC, must be significantly enhanced to incorporate this architecture;
Most importantly, it is a completely foreign way of thinking about programming, and therefore will probably not be widely considered as practical.

Design

There are four design elements of OPEN:

The process pool (P-Pool);
The data pool (D-Pool);
The data pool manager (DPM);
The data collection container.

This architecture will quickly create hundreds, if not thousands, of small processes that receive some data, manipulate it, and place the result back into the D-Pool. Registering all these processes manually becomes a very cumbersome task, so it behooves us to make this as simple for the programmer as possible.

There process must implement three things:

Registering the process itself;
Registering the data on which the process operates.
Providing the actual implementation of the process.

Each process can operate on more than one datum. If, for example, a process uses two data, the DPM must check every permutation of datum currently in the pool to determine if a process should be triggered. This will lead to disaster as the number of datum increases in the D-Pool. To avoid this, a specialized container is implemented that collects datum. When the collection is complete, the process is triggered. This collection is implemented as an STL multi-map. A datum can be associated with one or more collections-processes. The collection acquires the same name as the collection's process, thus associating the collection with its process. When a datum is placed into the D-Pool, the DPM iterates through the multi-map of datum, adding the datum to each associated collection. Every process whose collection has a completed datum list is then triggered.

This design has undesirable side affects:

A particular datum in a collection can change before the collection is complete;
Clearing the datum before the process exits can result in throwing away datum already being collected for a second iteration of this current process.

For this prototype, these side affects are ignored.

Finally, the D-Pool does not know what the data type is, for each datum in the pool. Using templates is not feasible because templates require the type to be known at compile time, imposing very annoying requirements on the programmer. Instead, the D-Pool maintains a collection of generic data containers. The data container is smart enough to convert from and to various built-in types and includes a virtual base class for custom derivations. Since the data container is not really the topic of this paper, its design and implementation is ignored. Feel free to browse through the source.

Implementation

To implement this design, all processes must be declared as classes and derived from the OPEN_Process class.

class OPEN_Process
{
public:

    virtual ~OPEN_Process() {}

    void RegisterDNames(void);
    bool SetData(const CString& dataName, const DataContainer& dc)
    {
        dataNameList[dataName]=dc;
        // this is quite the kludge to see if all datum for 

        // the process has been set

        return nameList.size() == dataNameList.size();
    }

    virtual void Run(void)=0;

protected:

    OPEN_Process(const CString& s1, const CString& s2) : 
         pName(s1), dName(s2) {}
    OPEN_Process(void) {};
    OPEN_Process(const OPEN_Process& p) :
    pName(p.pName), dName(p.dName), nameList(p.nameList),
        dataNameList(p.dataNameList) {}

protected:

    CString pName;
    CString dName;
    std::vector<CString> nameList;
    std::map<CString, DataContainer> dataNameList;
};

This class is implemented as a virtual base class. Other than the constructor (which must be invoked from a derived class), the only method of real interest to the user is Run(void), which must be implemented in the derived class. Note also that this class encapsulates the process name, the original datum name list, the datum name list separated out into a vector, and an STL map associating the datum name with the container that maintains the actual value. This information can be used to generate debugging information or a data diagram (for example, in Visio) of the data flow. Describing output data is currently not a requirement but could be easily added.

Each process is managed by a process pool. The OPEN_ProcessPool essentially encapsulates an STL map that associates the process name with a pointer to the process. Other objects in OPEN also use this class to interface to a specific process object.

class OPEN_ProcessPool
{
public:

    OPEN_ProcessPool(void) {}
    virtual ~OPEN_ProcessPool() {}

    void Register(const CString& processName, 
        class OPEN_Process* proc)
    {
        processList[processName]=proc;
    }

    bool SetData(const CString& processName, const CString& dataName, 
        const DataContainer& dc)
    {
        ASSERT(processList.find(processName) != processList.end());
        bool trigger=processList[processName]->SetData(dataName, dc);
        return trigger;
    }

    void Trigger(const CString& processName)
    {
        ASSERT(processList.find(processName) != processList.end());
        AfxBeginThread(OPEN_ProcessPool::StartProcess, 
            processList[processName]);
    }

protected:

    static UINT StartProcess(void*);

public:

    static OPEN_ProcessPool pool;

protected:

    std::map<CString, class OPEN_Process*> processList;
};

There is no particular reason from the programmer to interface to this class directly.

The OPEN_DataCollection implements the one to many association between the datum name and the processes that are interested in that datum. This is implemented as an STL multimap, and this class is essentially a wrapper for the multimap, providing registration and iteration methods.

class OPEN_DataCollection
{
public:

    OPEN_DataCollection(void) {}
    virtual ~OPEN_DataCollection() {}

    void Register(const CString& datumName, const CString& collName)
    {
        collectionList.insert(std::pair<const CString, 
            CString>(datumName, collName));
    }

    bool FindFirst(const CString& dataName, CString& collName)
    {
        iter=collectionList.find(dataName);
        ASSERT(iter != collectionList.end());
        collName=(*iter).second;
        return iter != collectionList.end();
    }

    bool FindNext(const CString& dataName, CString& collName)
    {
        ++iter;
        if (iter==collectionList.end())
        {
            return false;
        }
        collName=(*iter).second;
        return (*iter).first == dataName;
    }

public:

    static OPEN_DataCollection coll;

protected:

    std::multimap<const CString, CString> collectionList;
    std::multimap<const CString, CString>::iterator iter;
};

The OPEN_DataPool implements a wrapper around yet another STL map. This map associates the datum name with the actual value. It is this object with which the application interfaces to place data into the D-Pool. This class also implements a semaphore that unblocks the DPM, which then parses through the data in the D-Pool, removing it and placing it into the corresponding process container. This class also implements a CRITICAL_SECTION to ensure that the DPM (running as a thread) can read and write the collectionList without colliding with other threads that are potentially writing and erasing this collection also.

class OPEN_DataPool
{
public:

    OPEN_DataPool(void)
    {
        InitializeCriticalSection(&cs);
        dpSem=CreateSemaphore(NULL, 0, 0x7FFF, "OPEN_DP_SEM");
        ASSERT(dpSem);
    }

    virtual ~OPEN_DataPool()
    {
        DeleteCriticalSection(&cs);
        CloseHandle(dpSem);
    }

    void Add(const CString& dataName, const DataContainer& dc)
    {
        EnterCriticalSection(&cs);
        dataPoolList[dataName]=dc;
        LeaveCriticalSection(&cs);
        ReleaseSemaphore(dpSem, 1, NULL);
    }

    void RemoveDatum(CString& s, DataContainer& d)
    {
        EnterCriticalSection(&cs);
        std::map<CString, DataContainer>::iterator iter
            = dataPoolList.begin();
        ASSERT(iter != dataPoolList.end());
        s=(*iter).first;
        d=(*iter).second;
        dataPoolList.erase(iter);
        LeaveCriticalSection(&cs);
    }

public:

    static OPEN_DataPool pool;

protected:

    std::map<CString, DataContainer> dataPoolList;
    CRITICAL_SECTION cs;
    HANDLE dpSem;
};

The OPEN_Mgr implements the DPM. This class is very simple:

class OPEN_Mgr
{
public:

    OPEN_Mgr(void) {}
    virtual ~OPEN_Mgr() {}

    void Run(void);

public:

    static OPEN_Mgr mgr;

protected:

    HANDLE dpSem;
};

What is of more interest is the implementation of the Run method:

void OPEN_Mgr::Run(void)
{
    dpSem=OpenSemaphore(SYNCHRONIZE, FALSE, "OPEN_DP_SEM");
    ASSERT(dpSem);
    while (1)
    {
        DWORD ret=WaitForSingleObject(dpSem, INFINITE);
        if (ret==WAIT_OBJECT_0)
        {
            CString dataName;
            CString processName;
            DataContainer data;
            OPEN_DataPool::pool.RemoveDatum(dataName, data);
            bool ret=OPEN_DataCollection::coll.FindFirst(dataName, 
                processName);
            while (ret)
            {
                bool trigger=OPEN_ProcessPool::pool.SetData(processName, 
                    dataName, data);
                if (trigger)
                {
                    OPEN_ProcessPool::pool.Trigger(processName);
                }
                ret=OPEN_DataCollection::coll.FindNext(dataName, 
                    processName);
            }
        }
        else
        {
            break;
        }
    }
}

Implemented as a thread, this function waits for datum to be placed into the data pool, upon which the thread is released. It iterates through the data pool, removing each datum and value from the pool and storing it in the each process that is interested in the datum. The DPM then "triggers" the process as a thread when all the datum that the process requires has been instantiated.

To support a somewhat more readable implementation of processes, several macros are defined:

#define DECLARE_OPEN(x, y) \
class x : public OPEN_Process \
{ \
    public: \
    x(void) : OPEN_Process(#x, y) {} \
    virtual void Register(void) \
    { \
        RegisterDNames(); \
        OPEN_ProcessPool::pool.Register(#x, this); \
    } \
    virtual void Run(void); \
    static x _##x; \
}; \
x x::_##x;

#define IMPLEMENT_OPEN(x) \
void x::Run(void) {

#define FINISH_OPEN \
dataNameList.erase(dataNameList.begin(), dataNameList.end()); }

#define REGISTER_OPEN(x) \
x::_##x.Register()

Thus, the implementation of a process would look something like this:

DECLARE_OPEN(AddCost, "itemCost");
IMPLEMENT_OPEN(AddCost)
{
    double cost;
    dataNameList["itemCost"].Get(cost);
    double total=atof(dlg->total)+cost;
    OPEN_DataPool::pool.Add("totalCost", 
        DataContainer().Set(AutoString(total)));
}
FINISH_OPEN

And somewhere in the initialization section of the application, the process must be instantiated:

REGISTER_OPEN(AddToList);

The issue of process instantiation is an annoying one. You will notice that almost all of the OPEN classes automatically instantiate a singleton, implemented as a public static member of each class. This is also done with the processes. However, the actual registration of the datum names cannot be done at program startup because other necessary initialization (for STL, for example) has not occurred. To my knowledge, there is no way of pre-determining the initialization order of global or static data.

The Demonstration Program

The demonstration program is a simple and rather poor example of how this paradigm works. The "Add Part", "Remove Part", and "Clear Part" events each place data into the D-Pool which is picked up by the DPM. The most interesting thing about this example is that the datum "itemCost" triggers two events: one to update the list box and the other to update the running total. In actuality, this example is a poor one because the process handlers are so tightly coupled to the dialog object, which is definitely not a desirable thing to have happen in real life. Also note that I over-rode the AssertValid method so that I could update the dialog from a worker thread while still running in debug mode. Good 'ol MFC.

A More Interesting Thought Experiment

For this model to really be effective, the developer needs to completely rethink how applications are designed and implemented.

For example, a function might do something like this:

DB query Q1
DB query Q2
Operation A
DB query Q3
DB query Q4
Operation B
DB update QQ
DB update RR

In the spirit of this architecture, the function should be recoded into several processes:

Process 1:

DB query Q1

Process 2:

DB query Q2

Process 3: (depends on Q1 and Q2)

Operation A

Output result R1 into Data Pool

Process 4: (depends on R1)

DB query Q3

Process 5: (depends on R1)

DB query Q4

Process 6: (depends on Q3 and Q4)

Operation B

Output result R2 into the Data Pool
Output result R3 into the Data Pool

Process 7: (depends on R2)

DB update QQ

Process 8: (depends on R3)

DB update RR

As you can see from the above architecture, the program now automatically performs the database queries and updates simultaneously in separate threads. This can dramatically improve program performance and it is achieved by simply using a different paradigm for data management.

The Challenge

I consider this paradigm a significant enhancement to the existing process-centric programming styles. It results in:

smaller and simpler functions;
greatly increases parallel processing;
automatically provides detailed function and data tracing;
and documents all function inputs and outputs.

There are complexities in this model that are not fully understood. My challenge to the reader is to identify these complexities and design solutions for them, ultimately making this paradigm robust and easy to use. For example, in this prototype implementation, the DPM, D-Pool, and other objects are implemented as global singletons. It seems instead more reasonable that an application would have several data pools at different scales. This would extend the entire concept of organic programming? For example, (pardon the analogy), program organs.

Credits

Please credit the author, Marc Clifton, for the core of OPEN in any application that you build using it (I'm probably dreaming, right?). The author (me) also requests that he is provided with the source code and list of all enhancements that you make to the architecture of OPEN, so that they may be included in future versions for the benefit of all.

Conclusion

The Organic Programming Environment is not a replacement for process-centric modeling. However, the OPEN is a significant enhancement in the programmer's toolset because there are many cases where a data-centric model is superior to a process-centric one.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here