Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Fast Deep Copy by Expression Trees (C#)

0.00/5 (No votes)
30 Jan 2017 5  
Deep cloning code made with Linq.Expressions. Tests and test classes are enclosed.

Introduction

Deep Copy is not implemented in C#, there is only function Object.MemberwiseClone() on each object and it creates just a shallow copy. This article brings code of a very fast deep copy function implemented by Expression Trees (Linq.Expressions).

Background

General deep cloning function can be easily implemented by Serialization, with moderate effort by Reflection and with a lot of effort by Expression Trees. Serialization is very slow and may have some additional limitations (e.g. serialization by BinaryFormatter needs Serializable attribute, serialization by XmlSerializer does not maintain circular references). Reflection is 5 times faster but still much slower than manually implemented copy functions. Expression Trees are of the same speed as manually implemented copy functions and therefore are the fastest.

For nice working codes by Serialization and Reflection see section Links to Alternatives at the bottom of the article.

Speed Comparison

Speed comparison was done with the mentioned Serialization and Reflection codes from the section "Links to Alternatives". Used object: ModerateClass (included in the zip).

Table with exact numbers follows:

Number of Copied Items110100100010000100000
Serialization (ms)6729189149612969
Reflection (ms)5510362422432
Expression Trees (ms)1917192069570

Why are Expression Trees So Fast?

Expression Trees enable a developer to create functions and methods generically in runtime. It is closely connected to Reflection and has almost the same possibilities (e.g. hidden members can be accessed or assigned, no public constructor is needed, etc.). Difference from Reflection is that you have to write much longer code and also that a method made by Expression Trees needs to be compiled before its first use. Compilation in runtime takes some time (which explains why expression trees are slower by a constant for a small number of copied objects). But if program reuses the "in runtime compiled" method, it is as fast as compiled source code.

Advantages and Limitations

The code advantages are listed below:

  • Copies both public and private members
  • Works for readonly fields and properties
  • (new) Works for both classes and structs
  • Works also for circular references
  • Objects with multiple references are copied only once
  • Copies inherited types
  • Copies generic types
  • Works well in multithreading
  • Works for classes without parameterless constructor
  • Classes do not need Serializable attribute

The code limitations:

  • Works for .NET 4 and higher (C# 4.0 and higher)
  • Shallow-copies structs (user-defined structs containing classes in its fields will have problem)
  • Does not copy delegates and events (leaves null instead)
  • Fails on ComObjects (e.g. on some WPF dispatcher subobjects or Excel Interop)
  • Fails on any unmanaged object (e.g. from some external C++ library)

How to Use the Code

Copy the file DeepCopyByExpressionTrees.cs somewhere into your solution.

Then, you can use deep copy as an extension method on any object:

var copy = original.DeepCopyByExpressionTree();

or by implementing the DeepCopy method directly on the object:

class ExampleClass
{
    public ExampleClass DeepCopy()
    {
        return this.DeepCopyByExpressionTree();
    }
}

Implementation Description

If you are not interested in the technical description of the code, you can omit this section.

Main Methods and the References Dictionary

The generic public static method T DeepCopyByExpressionTree<T>(this T original,...) calls the nongeneric private method object DeepCopyByExpressionTreeObj(object original,...) and passes to it newly created references dictionary of type Dictionary<object, object>. References dictionary (or copy dictionary) stores original classes and their corresponding copies - they are reused based on ReferenceEquals comparison and therefore we avoid multiple copies of the same object or problems with circular references.

Creating and Storing Compiled Lambda Functions

For each class type is created its specific dynamic deep copy function. It is created (or retrieved) by function GetOrCreateCompiledLambdaCopyFunction(Type type), which is made thread safe and stores the already compiled functions to the static dictionary Dictionary<Type, Func<...>> CompiledCopyFunctionsDictionary.

Workflow of Copy Functions

Type specific deep copy lambda functions are created with the following workflow:

  1. The object is first copied by MemberwiseClone function.
  2. In the case of classes the copy is stored to the references dictionary under key of original object.
  3. The need-deep-copy fields are obtained by Reflection (with care for inheritance) and copied one by one again through DeepCopyByExpressionTreeObj, which means that compiled lambda function for object Obj1 calls (different) compiled lambda function for its child Obj1.Obj2.
  4. Delegate fields and events are set to null.
  5. For arrays of classes or interfaces, the copy function is called for each index set by dynamically created for loops. Arrays of value types are left untouched.

The FieldInfos of the need-deep-copy fields are obtained by reflection only once: before compilation of the copy function and therefore do not slow down the compiled function.

Note (Need-deep-copy fields): Those are fields of types which are either not value types or not primitive. Specifically omitted are strings, enums and the Decimal type. By this we get all classes other than strings and structs other than basic value types. Furthermore, from deep-copy are omitted also struct types which do not contain any class or interface in its hierarchy.

Note 2 (Properties): We do not care for properies at all - each property has its declared or auto-generated field, so it is enough to copy just fields.

Note 3 (Readonly fields): Readonly fields are the only thing which slows down the copying, because expression trees do not work here and repeated reflection calls are required. Read more here.

How are Expression Trees Involved

By Expression Trees, we imitate the regular code dynamically (the same as Reflection). I illustrate the process by presenting a small sample code of MemberwiseClone call, cast and assignment of newly created object.

In regular code, we would easily write one line:

ExampleClass output = (ExampleClass)input.MemberwiseClone();

but the code would be possible to use only from inside of the ExampleClass since MemberwiseClone is protected.

By using Reflection, it would be:

MethodInfo MemberwiseCloneMethod = 
    typeof(Object).GetMethod("MemberwiseClone", BindingFlags.Instance | BindingFlags.NonPublic);

ExampleClass output = (ExampleClass)MemberwiseCloneMethod.Invoke(input, null);

which is already possible to call from outside of the class.

And by using the Expression Trees, we write:

ParameterExpression inputParameter = Expression.Parameter(typeof(ExampleClass));
ParameterExpression outputParameter = Expression.Parameter(typeof(ExampleClass));

MethodInfo MemberwiseCloneMethod = 
    typeof(Object).GetMethod("MemberwiseClone", BindingFlags.Instance | BindingFlags.NonPublic);

Expression lambdaExpression = 
                Expression.Assign(
                    outputParameter,
                    Expression.Convert(
                        Expression.Call(
                            inputParameter,
                            MemberwiseCloneMethod),
                        typeof(ExampleClass)));

Points of Interest (Structs and readonly fields)

Without deep copy of structs, deep copy does not work also for such basic classes as Dictionary<,>, because dictionaries store key-value pairs into structs of the type Dictionary<,>.Entry and these Entry structs have to be therefore deep-copied. See Reference Source of Dictionary<,>. So we heavily need implementation of the structs copy.

Tricky part of structs copy code comes with writable and readonly fields. Copying of writable fields by Expression Trees must be performed directly on the struct type (without cast or boxing), but for copying of readonly fields we must use Reflection and reflection in structs must be done using boxing. With complexity of Expression Trees code and its limited possibility of debugging - this implementation was tough.

The following code (translated from Expression Trees code to regular C# code) ilustrates the obstacles.

// definition of variables
ExampleStruct copy = (ExampleStruct)original.MemberwiseClone();
Object boxedCopy = (Object)copy;
FieldInfo fieldInfo = copy.GetType().GetField("field");

// WRITABLE fields in structs (we use pure expression trees)
copy.field = value;                      // WORKING
((ExampleStruct)boxedCopy).field = value // NOT WORKING: we assigned field of temporary struct instance
fieldInfo.SetValue(boxedCopy, value);    // WORKING BUT SLOW (if repeated many times)

// READONLY fields in structs (we have to use reflection call in expression trees)
copy.field = value;                      // NOT WORKING: because of readonly
fieldInfo.SetValue(copy, value);         // NOT WORKING: copy was expected of type System.Object
fieldInfo.SetValue((Object)copy, value); // NOT WORKING: value is again lost because of unwanted boxing
fieldInfo.SetValue(boxedCopy, value);    // WORKING

Links to Alternatives

An alternative Expression Tree DeepCloner was written by Cygon (full article) and improved by Ymris (codeplex project). Another code (with some limitations) was written by Marcin Juraszek (codeplex project).

The best working and well tested alternative made by Reflection was coded by Alexey Burtsev (codeplex project). It passed all tests enclosed in the .zip.

The best working, tested and shortest by amount of code is Serialization through BinaryFormatter (code 1, code 2). Though it is the slowest possibility and requires the Serializable attribute, it clones very well everything what our solution does and furthermore does some copy of Delegates and Events. It passed all tests too.

Acknowledgement

To my former boss Roger Lord for the whole idea of using Expression Trees for fast deep copy.

History

List of changes:

  • [2016-07-12]
    • Added Implementation description
    • Note about readonly properties
    • Link to the alternative solution
  • [2016-07-13]
    • Corrected note about serialization: BinaryFormatter handles circular references
    • [minor] Changed name ClassType to ExampleClass
  • [2016-07-14]
    • [major] New .zip uploaded: code handles structs (and therefore dictionaries)
    • Uploaded also much better tests (in the same zip)
    • Added section Points of Interest
    • [minor] Edited section Workflow of Copy Functions
  • [2016-07-15]
    • New .zip uploaded: structs to deep-copy are filtered better - speed improvement in some cases
    • [minor] Full Visual Studio Solution was uploaded
    • [minor] Edited section Workflow of Copy Functions
  • [2016-07-21]
    • Acknowledgement
    • [minor] Alternatives added
    • [minor] "of Objects" removed from the Title, http address not changed
  • [2016-08-03]
    • Http address changed
    • [minor] Minor changes in wording
  • [2017-01-27]
    • Repaired copy of Object-typed fields that are assigned strings (and value types)
  • [2017-01-30]
    • Added ReferenceEqualityComparer according to Alexey Burtsev solution

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here