Abstract
Enumeration types are an essential ingredient in writing human readable source code. Due to their special nature, special care must be taken when deciding how to use them and - even more importantly - assessing implications of their use. By no small means are answers to these two questions governed by choice of implementation of enumeration types - that is, whether to use language provided enumeration type support, or other, customized approaches. This article compares various methods to implement enumeration types ranging from simple preprocessor constructs to more sophisticated, class-based methods. Although these constructs and their semantic peculiarities are discussed within the context of the C++ programming language, most of them can be used in C# or Java without much effort.
Being targeted not only to the novice C++ programmer, this article assumes some familiarity with the semantics of integral types and static class members in C++ and object-oriented design in general.
I Introduction
Motivation
Software design, and as a consequence, programming is much about representing abstract concepts or complicated structures in an easily understandable form benign to the human eye. This is one of the main reasons for the existence of so-called higher-level programming languages. Such languages usually feature advanced concepts like structured data types, loops or classes. One of the more primitive constructs are enumeration types. Their use is to symbolically map constant values, usually of integral types like int
, char
etc., to identifiers more intuitive to understand. For example, instead of 1
or 2
, it is usually better to write, e.g., foo
resp. bar
. Likewise, on encountering the same constant with different meanings in nearby places, enumeration types are helpful in establishing the meaning of a specific occurrence.
While most programming languages offer built-in support for enumeration types based on integral types, i.e., enumeration type values are represented as integral type values by the compiler internally, the developer is left alone in extending the concept of enumerated types to classes or structured data types in general. Other issues arise from type conversions performed implicitly by the compiler. While not necessarily a show stopper, they can hamper efforts of programmers striving for type safety or become a problem if conformance to strict semantic rules is required.
The remainder of this section will discuss a simple example for an application of enumeration types. The next section will concentrate on primitive approaches to enumeration types, that is, by preprocessor and built-in language support plus a comparison to "ordinary" constant integer variables. A more sophisticated approach to enumeration types which can be extended to classes is presented in the third section.
A Simple Example
A graphics library is to be implemented. As color-capable displays are common nowadays, the library is expected to support color and color manipulation. Part of this support is therefore representation of color values within the scope of the RGB color model. As an additional requirement, shortcuts for colors white, black, red, green, blue, magenta, yellow and cyan are to be provided.
Colors in RGB color model are specified by three values representing fractions of the three fundamental colors red, green and blue, respectively. These values are modeled as whole, unsigned numbers ranging from 0 to 255. A possible implementation might look like this:
class RGB {
public:
unsigned short red;
unsigned short green;
unsigned short blue;
};
II Enumeration Types Based On Integral Types
This section discusses enumeration types which are based on integral types. This includes simple constant values as well as built-in language support like C++'s enum
-types.
Preprocessor Statements
A crude but nevertheless quite widespread way to define enumeration values is using the preprocessor (which isn't part of C++, strictly speaking). For each enumeration value, a preprocessor macro is defined, which is subsequently expanded in turn. For example, on an array of RGB
instances:
RGB cgFundamentalColors[ 8 ] = {
};
one could define:
#define WHITE 0
#define BLACK 1
#define RED 2
#define GREEN 3
#define BLUE 4
#define MAGENTA 5
#define YELLOW 6
#define CYAN 7
and thus access the RGB
instance corresponding to red by cgFundamentalColors[RED]
.
It is important to note that the macro RED
is never seen by the C++-compiler. Instead, the preprocessor replaces it with the numerical value of 2
, i.e., the expression seen by the compiler is cgFundamentalColors[2]
. A simple corollary of this observation is that such macros are not actual enumeration values or types - they are just a notational convenience. As a consequence, such macros do not have special associated types - they behave just like ordinary integer literals and their type (e.g., long
, int
or char
) is that of the literal.
This means that values of such enumerations in general cannot be distinguished from those of other enumerations or any other integer values. In particular, the members of two enumerations can be compared to each other even if their meanings are totally unrelated:
#define SOME_VALUE 1000
if ( RED == SOME_VALUE ) ...
and, of course, they can be used interchangeably:
int someColor = RED;
someColor = SOME_VALUE;
provided their literal types allow for:
#define SOME_CHAR '0'
if ( RED == SOME_CHAR ) ...
char someChar = SOME_VALUE;
However, compiler warnings might indicate possible deviations of intended semantics. Whether or not this is acceptable must be decided case-by-case.
Built-In Enumeration Types
Most higher-level programming languages offer some built-in support for enumeration types. In C++, enumeration types are declared by the keyword enum
. The members of an enum
-constructed enumeration type are called enumerators. For example,
enum { white, black, red, green, blue, magenta, yellow, cyan };
defines an anonymous enumeration type representing fundamental colors. The main difference between this type and afore considered preprocessor macros is that enumerators are actually encountered by the compiler, that is, the compiler really sees cgFundamentalColors[ red ]
. However, as one can easily see, there is no explicit mapping of enumerators to a corresponding integer value in the above declaration. This is an important property of such a type - it is independent of its representation.
In many cases, this is what is wanted. Quite often, however, one needs to be more in control of how the enumerators are represented. In C++, each enumerator can be represented by a literal of an integral type explicitly. For example, the declaration:
enum { white = 0, black, red, green, blue, magenta, yellow, blue };
is equivalent to the preprocessor approach as far as its mapping to integer values is concerned. In contrast to the preprocessor macros, enumerators are typed, however. This becomes apparent in the following snippet:
enum FundamentalColors { white = 0, black, red,
green, blue, magenta, yellow, blue };
enum Fruit { apple = 0, orange, peach, cherry };
FundamentalColors aColor = red;
Fruit aFruit = apple;
if ( aColor == RED ) ...
if ( aColor == red ) ...
if ( aColor == apple ) ...
int i = orange;
aColor = orange;
The reason that the last assignment does not work is the main advantage of enum
declarations over macros: they limit the interchangeability of enumerators in assignments. On the other hand, the caveat about possible unintended comparisons between enum
-types remains because of built-in integral type conversions.
enum
-types have a very distinct enumerator set. As a consequence, they cannot in general be used with bitwise logical operators as it is the case with integer values. Operator overloading can help in such cases if the enumerator set isn't too large. If the enum
-type's enumerator set is sufficiently small, enumerators can get assigned powers of 2 for representation, for example:
enum { white = 1, black = 2, red = 4, blue = 8 };
Most often this will exclude use of enumerators as array indexers, however.
Instead of applying the bitwise logical operators to the enumerator itself, they are applied to the result of integral type conversion. However, the result of the operator will fall out of the enum
-type's enumerator set in general.
enum
-types present another small problem that can prove to be a nuisance - the names of enumerators are added to the type's defining scope. An enum
-type doesn't open a scope by itself. Therefore, it is relatively easy to provoke name clashes if enum
-types are declared in the global namespace.
enum { enu1, enu2, enu3 };
enum SomeEnum { enu1, enu2, enu3 };
class aClass {
enum { enu1, enu2, enu3 };
};
namespace {
enum { enu1, enu2, enu3 };
}
Constant Integral Type Variables
In a nutshell, enumerators are of a constant value nature. Sometimes, one might wish to use an enumerator in a way more akin to using a normal variable or instance, however. For example, there might be situations when access to the address of an enumerator is required. Identifying enumerators with constant variables can be helpful in such situations. This can easily be achieved by declaring constant variables of appropriate integral type and - typically - static storage class. For example, the color enumeration's enumerators could be declared like this:
const int white = 0;
const int black = 1;
const int red = 2;
const int green = 4;
It is important to note that these declarations do not declare enumerators in a strict semantic sense, but they can be used as such. Their nature is somewhat of a hybrid between the constant nature of enum
-type's enumerators or macros and normal variables. They can be used in any place where constants can be used. On the other hand, they can be referenced by their address, although via const
pointers only.
As the one-definition-rule applies to const
variables, too, the actual value of such variables are stored in exactly one place. Therefore, changing a const
variable's value will not require re-compilation of dependent sources (for simplicity, declarations of the form:
extern const type name = value;
are not dealt with here). The disadvantage of using const
variables is that they don't own a distinct type.
While this approach to implement enumerations, at first glance, may seem to be of academic interest only, the transition from values as enumerators to instances of types paves the way to class-based approaches in implementing enumeration types, which is the topic of the next section.
III Class-based Enumeration Types
Up to this point, implementation of enumeration types and their enumerators relied on some integral type for representation. In particular, use of enumerators was more or less boiled down to use of constants. The last variant described in the previous section, however, somewhat blurred this principle by using constant variables or instances of the type in question. By generalizing this concept and using classes instead of integral types, enumerators can be defined that combine most properties of the aforementioned approaches while still maintaining type safety and supporting object-oriented design.
Using Static Class Members As Enumerators
In C++ (and most other strongly typed object-oriented programming languages), each class has its own type, which is distinct from all other types. If not expressly defined, they cannot be converted into each other except for conversions to reference types of base classes. In particular, there is no implicit, built-in type conversion or value promotion as is the case with built-in types except for base/derived class pointer conversion.
Enumerators, on the other hand, are often used in an out-of-class context - that is, they do not require an instance of a particular class to be present; they are of a global constant/variable nature, instead.
Making use of these two observations, the RGB color class' declaration from section one can be rewritten like this:
class RGB {
public:
static const RGB WHITE;
static const RGB BLACK;
static const RGB RED;
static const RGB GREEN;
static const RGB BLUE;
static const RGB MAGENTA;
static const RGB YELLOW;
static const RGB CYAN;
public:
RGB( unsigned int red = 255, unsigned int green = 255,
unsigned int blue = 255 ) throw();
unsigned int red;
unsigned int green;
unsigned int blue;
};
The definition of the static members is straightforward and uses RGB
's ctor to set color values:
const RGB RGB::WHITE;
const RGB RGB::BLACK( 0, 0, 0 );
const RGB RGB::RED( 255, 0, 0 );
This declaration of RGB
allows for any two instances of class RGB
to be compared to each other (assuming appropriate comparators being present). In particular, any RGB
instance can be compared with the eight constant instances representing the fundamental colors. No instance of RGB
can be compared to instances of other types, except for instances of derived classes. This special case can be dealt with by checking an instance's class from within comparators if necessary. For the rest of the following discussion, RGB
is assumed not to have any classes derived.
Special care must be taken when it comes to declaring user-defined conversion operators with integral types as target type - this might allow use of RGB
in a manner akin to built-in enumeration types.
Assignment and initialization of class RGB
instances can be done via copy constructor implementation and/or assignment operator overloading as usual. Bitwise logical operators can be overloaded as seen fit, if necessary. The only limitation is that only integral types can be used as arguments for the switch
statement. This is not really a big problem as cascaded if
statements can do the same job, although with a slight run-time penalty because of lack of compiler-generated jumptables.
Run-Time Behavior of Static Members
Whenever, except for trivial cases, static class members are involved, it is a good idea to remember initialization peculiarities of variables or instances of static storage type. For fundamental types, the rules are quite simple: literals, literal expressions and expressions with static variables of fundamental types initialized before are assigned at compile-time. The same goes for static variables of reference types if they are literals; normally, this means initialization with NULL
. The order of initialization is that of definition within the translation unit.
Pointer type variables that are initialized with constant, non-literal expressions, i.e., addresses of other variables of static storage type of whatever type, are initialized at link-time. Link-time can have two meanings: static link-time, that is, when the link editor is executed from command line (e.g., nmake, Visual Studio project build). Depending on the target platform and executable type, it can also mean dynamic link-time. For example, when executing DLLs or other types of shared code, the final addresses of any variable (including contents of jumptables and procedures) will not be determined until the run-time linker is run.
Finally, the so-called ctor-chain is executed. During ctor-chain, any remaining initializations are executed in the order of definition throughout the translation unit. The following code snippet summarizes these rules:
int a = 0;
int b = 1 + 1;
int c = b + 1;
int* pint = NULL;
int** ppint = &pint
void foo();
void (*ptrFoo)(void) = &foo;
RGB aColor( 10, 10, 10 );
RGB* pAColor = new RGB( 10, 10, 10 );
Initialization order between translation units is compiler-dependent. Quite often, it is determined by the order in which translation units (i.e., the compiled code) are fed into the link editor at compile-time.
Using Pointers Instead of Instances
As long as class-based enumerators are referenced from the translation unit only, they are defined in class instances as static members can be used without many problems. Evidently, this isn't the case in general; static members are referenced from all across the translation units the executable's code is built from or, in case of library projects (e.g., DLLs), from anywhere anytime.
In most situations, this severely limits the use of static class members if they are instances of classes. The solution to this problem is to use pointer types instead. They are initialized at link-time and thus before any access by running code. For example:
class RGB {
public:
static const RGB* const WHITE;
static const RGB* const BLACK;
};
const RGB* const RGB::WHITE = new RGB;
const RGB* const RGB::BLACK = new RGB( 0, 0, 0 );
Except for being pointers, this enumerator variant can be used like its instance counterpart. While memory allocation, i.e., calling operator new
, is done when executing the ctor-chain, freeing the memory must be done by hand. This isn't really necessary in most situations, though, because allocation takes place only once when the code in question is loaded into the calling process' address space. However, manual deallocation will keep the memory leak detection happy.
Execution of the ctor-chain imposes a time penalty. There are situations when this is not acceptable. For example, this might be the case for time critical code or if too many initializations take place. If enumerators can be guaranteed to be used only as opaque entities which are compared for equality at most, a special initialization expression can be used:
const RGB* const RGB::WHITE = reinterpret_cast< const RGB* const >( & RGB::WHITE );
const RGB* const RGB::BLACK = reinterpret_cast< const RGB* const >( & RGB::BLACK );
This initialization takes place at link-time and preserves the most important property - uniqueness. Source code generators can easily be set up to produce such expressions automatically.
The disadvantage is these pointers do not point at real class instances. Any attempt to access non-static members of class RGB
via such pointers will utterly fail. This technique therefore should be used only after careful consideration of its consequences and in a well-documented manner only.