Bend the .NET Object to your Will!

John Batte

4.83/5 (25 votes)

14 Jan 2009CPOL4 min read

47.7K

124

Clone, serialize, and deep-compare any .NET object, regardless of type

Download source - 43.08 KB

Introduction

Have you ever had to implement ICloneable on a complex type? Gets out of hand in a hurry, doesn't it? How about IEquatable<T>? Here's a good one: what happens when you need to serialize an object graph using BinaryFormatter (so it can be transmitted or stored) and somewhere in the tree there's a type you don't control that isn't serializable? XML to the rescue, right? But when you punt the object over to the XmlSerializer, there are read-only properties you don't control that aren't participating. Now what? Create your own surrogate type and handle the marshalling operations in some utility? Sounds like a pain in the butt to me. Which is why I decided to do it one more time, and then never again. :)

In order to clone an object, what do you really need? Ultimately, all you need is the structure of the object, and its simple values. If you know those two things, a new copy of the object can be constructed.

What about deep comparison between objects? Same thing. If an object's structure and each of its simple values equal another object's, then those objects are value-equivalent.

And wouldn't you know it, the process of serializing an unknown type requires that we store the structure of the object and its simple (implicitly serializable) values in a new structure that can be serialized.

Since all three features depend on the same thing happening to your objects, all of the extension methods delivering these features depend on the same class : ObjectGraph.

Background

This article focuses on a few small extension methods that all make use of a new class called ObjectGraph. This class decomposes objects down to their simplest values while maintaining member association. This enables objects to be analyzed and manipulated in fine-grained ways, regardless of type.

This article makes use of .NET Framework 3.5.

Using the Code

The code is extremely easy to use:

var instance = new ComplexType // this object could be anything at all
{
    Id = 47,
    Name = "My Complex Type",
    ArbitraryValue = ArbitraryEnum.Foo,
    Values = new List<string>(new[]{"Value1", "Value2", "Value3"})
};
 
// extension method:  Clone
ComplexType clone = instance.Clone(); // a true deep copy
 
// extension method:  ToBinaryString
string serializedInstance = instance.ToBinaryString(); // a base-64 encoded byte array 
 
// extension method:  ToObject<T>
var deserializedInstance = serializedInstance.ToObject<ComplexType>(); // another clone! 
 
// extension method : ValueEquals
bool isCloneEqual = instance.ValueEquals(clone); // true
bool isRoundTripEqual = instance.ValueEquals(deserializedInstance); // also true :)

How It Works

The biggest convention breaker here is the idea of being able to serialize any object using the BinaryFormatter, even ones that aren't decorated with [Serializable]. It's a simple trick: the object being serialized isn't your object. It's actually a wrapper class (ObjectGraph) that is 100% serializable, and stores enough information to completely rehydrate your object after being deserialized.

When ObjectGraph wraps an object, several things may take place, depending on the object being wrapped. If the wrapped object is a simple type, i.e. one that the code recognizes as being directly serializable, then the raw value of the object is stored and the wrapping operation is complete. If the object has already been wrapped in the current graph, a pointer to the original wrapper is stored. If the object is an array of other objects, then the array items are individually wrapped and stored. If the object is a complex type, then each of its member variables are wrapped and stored in a name-keyed dictionary.

Why member variables? This is the key. No matter what the public interface of your class, if the class holds state information at all it will be in a member variable. Automatic properties get their variables generated for them, but it's all the same. Once I have the value of all of an object's variables, I can use Reflection-based instantiation to create an exact copy of the object, or compare them to any other object with a matching type.

Most of ObjectGraph's code loses its meaning if you try to read individual methods out of context, so I apologize if this doesn't make enough sense, but here's the private ObjectGraph constructor; it should give some clue as to how the ObjectGraph analyzes the object it wraps.

private ObjectGraph(object data, GraphRegistry registry, bool isRootGraph)
{
	// make sure to unhook all pointers created during scan
	using (new DisposableContext(() => { if (isRootGraph) registry.Clear(); }))
	{
		_isValueBased = data.IsValueBased();
 
		if (_isValueBased) _value = data;
		else
		{
			_pointer = registry.Register(data, this);
 
			if (_pointer == null)
			{
				_isArray = data is Array;
 
				if (_isArray)
				{
					_arrayItems = GetItems((Array) data,
                                                 registry).ToList();
 
					// CLR gens type names for arrays using the
					// {itemTypeName}[{length}] syntax.
					_type = Regex.Replace(
                                                                    data.GetType().AssemblyQualifiedName,
					                      @"\[\d*\]", string.Empty);
 
					return;
				}
 
				_state = GetValues(data, registry);
			}
		}
 
		_type = data != null ? data.GetType().AssemblyQualifiedName : string.Empty;
	}
}

Points of Interest

Check out the unit tests in the source. They show that in addition to CLR types & your custom types, anonymous types will also participate happily with the ObjectGraph class.

Speaking of which, the unit tests included in the source are not really unit tests; they are integration tests with BDD naming semantics, all of which is completely improper. The only reason they are present is so that I (and you) can quickly debug the code. Please do not think that this article is attempting to address the proper way to implement TDD or BDD. In fact, here's a disclaimer: THIS ARTICLE DEMONSTRATES POOR TESTING HABITS.

Also, since indirect recursion is used in both object scanning and rehydration, I have concerns that a graph of sufficient depth could cause a StackOverflowException to occur. I have not been able to make this happen in practical use, so it may be okay for most scenarios. Fair warning.

Finally, I would like to thank the members who quickly responded with some critical feedback that led to this component's current status. Your input is much appreciated!

Enjoy :)

History

12/19/08: Submitted first draft
12/20/08: Submitted second draft & code revision for cyclic reference support
1/11/09: Submitted final draft

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)