Intent
Define a representation of data that facilitates a separation of physical and logical views of the data and its supporting metadata constructs.
Motivation
A common programming requirement is to load raw data from a physical data store such as a database, XML file, etc and to provide a "view" of the data that is logical to an application's needs. This logical view of the data will often include certain metadata such as possible values or ranges which can be applied to a particular part of the data. It is not desirable to always implement each logical view of the data as its own class (or set of classes), because this promotes code duplication. In addition, it is often not desirable for various portions of an application to share a single logical view because this promotes overly complex designs which become tightly coupled with the implementation.
Consider a contact management application. In such an application, there is a need to store and display information that may include names, addresses, phone numbers and any other desirable data field. This information is typically stored in some form of RDBMS and accessed using the SQL language. Typically when working with RDBMS software, it is necessary and desirable to structure the storage of data into various tables, and implement integrity constraints accordingly. This process of normalization is intended to improve insert/update performance and reduce redundancy. Unfortunately, this process usually results in the logical view of the data becoming further separated from the physical, requiring the application to "map" the physical structure into a more meaningful view. This mapping process may be very simple, but can become quite complicated, when the separation increases.
In addition to the mapping of raw data, a contact management application will need to maintain certain metadata. An example of this meta data is apparent when working with the "Name prefix" portion of the data. A name prefix will typically be one of a set of values such as Mr., Mrs., Ms., Miss., etc. This list of values will need to be available to various portions of the application, to facilitate user-interface specific functionality and to provide validation support. Different portions of the raw data will have varying metadata associated with it. Metadata can include value lists, validation rules, application specific properties, etc.
We can solve this problem by designing a basic "Entity" class that encapsulates the raw data, and binds the associated meta data to it and provides a common interface for accessing and manipulating the raw data. This Entity class will hide the specifics of data persistence from the rest of the application and provide a basis for many data-centric controls. The ability to implement data-centric functionality, decouples the data from the application and facilitates code reuse.
An added benefit of the Entity pattern is the ability to define multiple logical views of a single physical representation of data. This ability can be critical to applications requiring a high level of security or customization.
Applicability
Use the Entity pattern when
-
the physical representation of data diverges from the logical representation desired for an application.
-
there is a need for more than one logical representation of the same physical representation.
-
the physical representation of data is subject to change, while the logical view remains consistent. This often happens when there is a need to persist data to or from disparate physical formats.
-
more than one data set needs to share a common set of functionality.
-
there is an extensive amount of metadata associated with the view of the data.
-
there is a need for transactional behavior within the logical view of the data.
Structure
Sorry guys, I don't have a good way to draw the structure for this, though I wish I did. So, let me describe it with as few words as possible.
- Entities contain:
- One or more Fields
- Zero or more Properties
- Zero or more Rules
- Zero or more Comments
- Fields contain:
- Zero of more child Fields
- Zero or more Values
- Zero or more Properties
- Zero or more Rules
- Zero or more Comments
- Values contain:
- Zero or more Properties
- Zero or more Comments
Participants
- Entity - declares an interface for representing a logical view of data and implements support for basic functionality
- Field - implements support for type-specific (and potentially type-safe) in-memory data storage as well as logical-only, non-storage fields. Also implements support for collections of contained entity objects. A Field participant will typically have specialized-derived types such as CollectionField, GroupField, TextField, 32BitSignedField, 64BitSignedField, DateField, TimeField, BlobField, etc.
- Value - declares an interface for representing one possible value for a given Field object
- Rule - declares an interface for non-domain and non-application specific rules as well as application specific rules when needed. A Rule participant will typically have specialized-derived types such as MinLengthRule, MaxLengthRule, ValidCharsRule, ValidDateRangeRule, etc.
- Property - implements support for extensible application and domain specific configuration options at the Entity, Field and Value levels
- StorageEngine - declares an interface for persisting the logical view of data to and from its physical representation
- Comment - declares an interface for attaching comments to any element of the entity interface, including the Entity, Field, Value, Rule, Property and StorageEngine participants.
Consequences
The Entity pattern has the following benefits and liabilities:
- It isolates data persistence from the application. Most applications attempt to separate the data access layer from the presentation layer for good reason. Doing so decouples the disparate portions of the application, facilitating easier changes. The Entity pattern facilitates this decoupling implicitly by its design.
- It promotes sharing of data validation rules. It is not desirable to write redundant code. The Entity class provides a basic interface for accessing data in a logical format, rule objects can be created to implement specific rules such as range checking, value list limits, date formats, NULL values, etc. These rules can be written once and applied to all Entity objects regardless of the data they represent.
- It facilitates data-centric application configuration. Often, it is necessary for an application to behave differently based on what data it is acting upon. This modality is often implemented such that, the implementation expects a very specific type of object for each mode. This modality often forces an unnecessary duplication of code or overly complex class hierarchies, because each type of object may need the same support with only minor differences. The Entity pattern facilitates data driven configuration, by providing application defined properties at the Entity, Field and Value levels, which can be interpreted by an application at run-time to determine the desired behavior.
- It facilitates self-documenting data models. The Entity pattern facilitates self documentation because, the Entity class knows the logical view of the data and provides an interface for interrogating this view at runtime, much the same way that iterators provide a common accessor methodology. This capability can be important during the documentation phase of development, but can also prove invaluable at runtime, because dynamically adding fields/values/rules can be detected and dealt with accordingly, thus facilitating features such as user-defined custom fields, etc.
- It facilitates multiple physical representations of the data. It is often necessary to persist data to and from varying physical representations. This may include different RDBMS platforms, XML files, ASCII files, etc.
- It facilitates data security. By allowing the same physical data to be represented with multiple logical views, it is possible for an application to define custom views based on the role of the user or based on the component of the application that is accessing the data. These custom views need only provide the minimal number of necessary fields and features thus hiding the rest of the data.
- It can cause type-safety concerns. An implementation of this pattern may abstract the raw data in such a way that, it does not provide type-safe access to the data. This may not be a significant disadvantage if the other benefits outweigh the type-safety concerns. It is possible that an implementation of this pattern may support strong type-safety through code generation or generics. Such an implementation may be more complex or less flexible.
Implementation
Here are some useful techniques for implementing the Entity pattern.
- The logical view of an entity should be configured, not programmed. An Entity object is defined by its properties, fields, values and rules. This definition can be represented through a set of database tables, an XML file or any other appropriate means. The definition should thoroughly define each aspect of the entity. This definition should be loaded at run-time to automatically configure the logical view of the data.
- Implement Entity derived classes as generics. It may be desirable to adapt entity classes at compile-time, to support special domain-specific features. Though the base Entity class will not be a generic, the derived classes can be implemented to take an adaptor object as a customization parameter. This adapter object may then be used across many entity implementations.
- Implement several runtime customizable StorageEngine derived classes. Instead of using one StorageEngine class for each distinct logical - physical view combination, it is desirable to implement the StorageEngine classes for each physical storage medium. The StorageEngine derived classes can use the Entity classes provided interface to determine proper mapping of in-memory data to and from the physical format. This would promote a greater level of reuse by allowing numerous logical views to use a single shared StorageEngine class to persist their data.
- The internal representation of data within the entity object should be type-safe wherever possible. Doing so helps to insure that, data remains intact and that the integrity of the data cannot be easily compromised by unexpected occurrences.
- Support for NULL data should be dealt with here. Often, dealing with NULL data can be very frustrating in programming languages, because there is no explicit support, at the language level, for NULL values.
- Use a Class Factory pattern for creating Entity objects and StorageEngine objects, thus further decoupling the application from the data and storage needs.
-
Implement a generic base class to represent behaviors and actions which can be taken based on the logical view of the data. The derived classes could potentially be implemented as generics, to further improve robustness and flexibility.
An example of an action which can be applied across all Entity objects regardless of the logical views, is an integrity checker. An integrity checker could interrogate the Entity object for all fields contained within it, execute the appropriate validation rules and report any inconsistencies.
An additional example of a generic action might be a DocumentWriter class. This class would be responsible for interrogating the Entity object's logical view and producing a document (in HTML, PDF or any other format), which details the structure of the entity, the fields, value lists, rules and any domain-specific properties contained with it.
There are literally hundreds of possible ways to use the self-documenting nature of entity objects, to produce flexible and reusable data driven behaviors. These behaviors could be used as needed, by any application, component, or user, without concern for the actual data being acted upon. Domain-specific properties can be applied to the entity, to further customize these behaviors.
-
Implement support for instrumentation at the data level. All access to the raw data and metadata must pass through the Entity class. Instrumentation of this access can be easily localized to the Entity object.
It may be desirable to log the data that is accessed, who accesses it and when they access it. Doing so with this pattern is quite easy because all of the data access passes through a common base class.
-
Implement support for dynamic run-time customization of logical views. In our contact management application, it is often desirable to allow a user to add (at runtime) custom fields to the contact data objects. The Entity class should facilitate this runtime configuration through its exposed interface and provide a means for insuring that the customized field is properly dealt with throughout the application.
The custom fields should behave like all other fields. A property may be set to indicate that the field is user defined. However, in all other respects, the field behaves exactly the same.
When this feature is present, it is more important that the StorageEngine classes be implemented as discussed earlier. It is not possible to rewrite the persistence code whenever a user needs to add a custom field. If the storage engines are designed to interpret the logical view at run-time and automatically cope with the underlying physical view, then the application should be able to continue without interruption.
-
Implement support for calculated fields. Most data sets include static and calculated field values. For example, a contact Entity may have a field named DOB to represent the date-of-birth of the contact. It would be redundant and unwise to also include a field named AGE to represent the age of the contact, because the age can be computed on-demand based on the date of birth. Instead, the AGE field should be included in the logical view as a calculated field.
Calculated fields should behave exactly the same as normal fields. The only significant difference may be that they are read-only and that they are typically not persisted to physical storage.
Complementary concepts
Here are some additional concepts which can be applied to this design pattern.
- Implementing support for an event interface to notify subscribers of changes to the logical view, contents of the logical view or associated meta data could be added to the basic design, without adversely affecting the basic conceptual layout of the pattern.
- Support for transactional behavior in regards to changes of the contents of the entity as well as the logical view of the data and the associated meta data could be added to this design pattern.
- Change history (aka versioning) of both the logical views themselves including meta data as well as the entity contents could be handled as an integral part of the implementation of this design pattern.
- Support for logical derivation of one entity off of another could be implemented.
- Instrumentation of actions taken against an entity (i.e., setting field values, changing meta data, etc) could be provided as part of an implementation of this design pattern.
Sample code
None provided at this time. A future article may provide a full implementation of this design pattern.
Related patterns
- ClassFactory classes may be used to facilitate the construction of Entity objects and StorageEngine objects.
- Adapter classes may be utilized to provide type-safety or interface customization to the underlying Entity objects. This may be done to facilitate using Entity objects in an existing application.
- Flyweight classes may be used for Rules, Fields or even Entity objects, depending on the exact implementation taken. This may reduce the memory footprint of the Entity objects and as a result improve application performance and scalability.
- Interpreter classes are used for loading the configured entity definition.
- The Observer pattern can be implemented by providing a means for Entities to notify associated objects of changes to the Entity state.
- The Strategy pattern is used to provide external adaptability and behaviors for Entity objects.