Introduction
In C#, sometimes you would like to define a base class, or a library, that uses an enumeration, and you would like to allow derived classes or users of your library to define new values for it. Trouble is, enum
s are not extensible: a derived class or user code cannot define new values.
For instance, one time, I wrote a library for serializing and deserializing "shapes" with geographic coordinates to a text file. In this library, several different shapes were supported: circles, rectangles, lines, polygons, and so on. Suppose there is a ShapeType
enumeration for this:
public enum ShapeType {
Circle,
Rect,
Line,
Polygon
}
Enumerations are great for storing in a text file, since you can write t.ToString()
to convert a ShapeType t
to a string, and Enum.Parse(typeof(ShapeType), s)
to convert the string s
back to a ShapeType
.
But suppose you would like to allow other developers to define their own shapes. Other developers cannot add new values to ShapeType
, and even if they could, there is a risk that two developers would assign the same integer value to different kinds of shapes. How can we solve these problems?
Ruby to the Rescue
Sometimes, when an extensible enum is needed, people use strings or integer constants (const int
or readonly int
) instead of enums. These solutions have at least the following problems:
- Strings and integers are not normally used for enumerations. Therefore, when other developers see a "
string
" or "int
" property or parameter, they don't immediately realize that it is used for an enumeration.
- Some of the benefits of static typing is lost, since you can mistype a string or accidentally put a string/integer in a location that was intended to hold an enum value (or vice versa).
- Strings can't be renamed with a refactoring tool (like the "Rename" feature of Visual Studio).
- Strings are much slower than enumerations when comparing for equality, and they are slower than integers when used as dictionary keys (though due to an odd decision by Microsoft, enums also perform poorly as dictionary keys).
- When using integers, it's hard to guarantee that two different developers each use unique values when extending the enumeration.
In the dynamic language Ruby, we commonly use symbols instead of enumerations. Symbols are like string literals, but instead of a string like "Circle"
, you use the symbol syntax :Circle
.
For the most part, symbols solve the above problems. They can be compared as fast as integers, and they cannot be confused with strings. Since anyone can define a new symbol at any time, symbols are like an enumeration of unlimited size. And, if you use them as I prescribe below, it is possible to rename them with a refactoring tool.
(Edit: later I found out that other languages also also have Symbols as a built-in concept, e.g. LISP)
Symbols in .NET
I have written a Symbol
implementation for .NET that you can use as an extensible enum. I will now demonstrate how we can rewrite our ShapeType
enum to use Symbol
s instead. First, change enum ShapeType
into a class. Then, replace each enum value with a Symbol
:
public static class ShapeType
{
public static readonly Symbol Circle = GSymbol.Get("Circle");
public static readonly Symbol Rect = GSymbol.Get("Rect");
public static readonly Symbol Line = GSymbol.Get("Line");
public static readonly Symbol Polygon = GSymbol.Get("Polygon");
}
If a third party wants to extend this list of symbols, they should write another static class with the additional possibilities. For example, Xyz corporation might write this extension:
public static class FractalShape
{
public static readonly Symbol Mandelbrot =
GSymbol.Get("XyzCorp.Mandelbrot");
public static readonly Symbol Julia = GSymbol.Get("XyzCorp.Julia");
public static readonly Symbol Fern = GSymbol.Get("XyzCorp.Fern");
}
To ensure two independent parties don't accidentally define the same symbol for two different shapes (for example, if two different parties both made a shape called Fern
), it is advisable to try to use a unique name when calling GSymbol.Get
. That's because GSymbol.Get
always returns the same Symbol
when given the same input string. Therefore, in this example, I used the prefix "XyzCorp.
" to ensure that names defined by Xyz corporation are unique.
Typesafe Symbols
When I first wrote this article, people complained that Symbol
s are not type-safe: you could accidentally mix up two unrelated enumerations, since they both have type Symbol
. And by votes of 3, my article was cast into the pit of zero readership. Besides, ShapeType
as defined above is not a drop-in replacement for its enum
equivalent, because ShapeType
variable declarations have to be changed.
ShapeType rect = ShapeType.Rect;
would have to be changed to this:
Symbol rect = ShapeType.Rect;
Now you can overcome these limitations using a type-safe "symbol pool". A SymbolPool
is a "namespace" for symbols. There is one permanent, global pool (used by GSymbol.Get
), and you can create an unlimited number of private pools. I will say more about how they work below, but for now, let me just show you how to make a type-safe extensible enum using SymbolPool<ShapeType>
:
public class ShapeType : Symbol
{
private ShapeType(Symbol prototype) : base(prototype) { }
public static new readonly SymbolPool<ShapeType> Pool
= new SymbolPool<ShapeType>(p => new ShapeType(p));
public static readonly ShapeType Circle = Pool.Get("Circle");
public static readonly ShapeType Rect = Pool.Get("Rect");
public static readonly ShapeType Line = Pool.Get("Line");
public static readonly ShapeType Polygon = Pool.Get("Polygon");
}
Since ShapeType
's constructor is private
, the only way to make a new ShapeType
is by calling ShapeType.Pool.Get()
.
Now, a third party "XyzCorp" can define new ShapeType
s as follows:
public class FractalShape : ShapeType
{
public static readonly ShapeType Mandelbrot =
Pool.Get("XyzCorp.Mandelbrot");
public static readonly ShapeType Julia = Pool.Get("XyzCorp.Julia");
public static readonly ShapeType Fern = Pool.Get("XyzCorp.Fern");
}
Note that the members of FractalShape
still have the type ShapeType
. It is not necessary to derive FractalShape
from ShapeType
; I only do so to make it clear that the two are related.
Using Symbols
- To convert a
Symbol s
to a string, call s.Name
. You can also call s.ToString()
, but this prefixes the name with a colon ( : ) as in Ruby (edit: I removed the colon in newer versions of Symbol
, so Symbol
s act more like ordinary string
s and enum
s.)
- Instead of
Enum.Parse
, when you want to convert a string
back to a Symbol
, you can call GSymbol.Get(string)
to create a global symbol, or Pool.Get(string)
where Pool
is a private symbol pool.
- To get all symbols in a pool, just enumerate the pool:
foreach (ShapeType s in ShapeType.Pool)
...
The symbols are returned in the same order as they were created.
Note that every Symbol
you create consumes a small amount of memory that cannot be garbage-collected if the Symbol
's pool is stored in a global variable. Therefore, if the string comes from a large file, you may wish to call Pool.GetIfExists(string)
instead (where Pool
is either a private pool or GSymbol
) to avoid a memory leak. GetIfExists
does not create new symbols, it only returns symbols that already exist. Therefore, if you get a nonsense name like "fdjlas", GetIfExists
returns null
instead of a valid Symbol
.
There is a catch: if you use GetIfExists
, you need to make sure that all desired symbols already exist. Therefore, before calling ShapeType.Pool.GetIfExists
to decode a shape type name, you must make sure that derived types such as FractalShape
are initialized. Accessing any ShapeType
from FractalShape
will do the trick:
ShapeType s = ShapeType.Pool.GetIfExists("XyzCorp.Fern");
s = FractalShape.Julia;
s = ShapeType.Pool.GetIfExists("XyzCorp.Fern");
How Symbols Work, Briefly
This library has four classes.
- A
Symbol
is simply a small class with a read-only Name
, integer Id
, and a reference to its Pool
. Every Symbol
is cataloged in a SymbolPool
.
- A
SymbolPool
contains a set of Symbol
s. SymbolPool
contains a List<Symbol>
and a Dictionary<string, Symbol>
which are used to look up symbols by ID and by name, respectively. SymbolPool
is thread-safe; you can safely create Symbol
s in the same pool from different threads.
SymbolPool<T>
is a derived class of SymbolPool
that creates T
s, where T
is a derived class of Symbol
. You pass a factory function to its constructor, and when someone calls Get()
to create a T
, the SymbolPool
calls your factory function, passing a "prototype" as a parameter (the "prototype" is a Symbol
that you can use to construct a T
).
GSymbol
contains the "global" SymbolPool
. Call GSymbol.Get
to create a "global" Symbol
.
Each Symbol
has an ID number; this is nothing more than the value of a counter that is incremented each time a symbol is created. IDs are unique within a given pool, but may be duplicated across pools. Private pools have positive IDs by default, starting at 1; the global pool has negative IDs starting at -1, except for GSymbol.Empty
, which is the Symbol
that represents the empty string (Name == ""
).
GetHashCode()
is fast because it returns an ID number instead of obtaining the hash code of the string; therefore, Symbol
s are fast when used as keys in a Dictionary
. Comparing Symbol
s for equality is fast, because only the references are compared, not the contents of each Symbol
. Two Symbol
s are the same if and only if they are located at the same memory location.
Besides making type-safe extensible enumerations, another reason to use a SymbolPool
is to construct a temporary set of Symbol
s that can be garbage-collected later. A SymbolPool
and all the Symbol
s it contains can be garbage-collected when there are no references left to the pool itself or any of its Symbol
s. Note that a Symbol
has a reference to its pool, so any lingering references to a Symbol
will keep its entire pool alive.
As for me...
In the Loyc compiler tooling project, source code is represented with Loyc trees. In Loyc trees, I use Symbol
s rather than string
s to represent all identifiers in source code (variable and method names) as well as names of built-in operators and constructs. This avoids storing multiple copies of strings and allows fast equality comparison.
History
- June 1, 2008: First version.
- December 12, 2009: Introduced SymbolPools.
- December 14, 2009: Released on CodeProject.
- February 24, 2010: Added support for type-safe Symbols.
- February 25, 2014: Formatting error corrected. Edited some text based on newer information.