Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

A Study In Equality

0.00/5 (No votes)
26 Dec 2007 2  
Understanding value and ref types for equality tests and as keys in collections

Introduction

When learning a 'C'-based language, one quickly discovers that there is a difference between value types and reference types. As one writes more complex applications, we occasionally need to treat a reference type as a value type with regards to equality, meaning that comparing two references should not return true or false as to whether they are the same instance but rather whether they contain the same values. Comparing reference types by value is especially useful when we are using a class as a key in a collection (such as a dictionary) and we want the key's value to determine whether another instance's value is contained in the collection.

The Test Code

The following illustrates how to create a class suitable for comparing by value. At the end, these classes will be used to test how the generic List and Dictionary collections work as well.

Step 1: A Basic Class

public class AClass
{
  private readonly int i;

  public int I
  {
    get { return i; }
  }

  private AClass() {}

  public AClass(int i)
  {
    this.i = i;
  }
}

Why is the default constructor marked private? This has to do with the practice that classes to be compared by value should be immutable, which is discussed in the Advanced Concepts section. Suffice it to say that when the field is designated as readonly and the property for the field provides only a getter, a default constructor is meaningless, as you can only set the fields in the constructor.

We're going to take the above class and alter it so that equality tests will be treated by comparing values rather than references. But first, let's see how this class behaves in an equality test as it stands:

static void CompareClasses()
{
  Console.WriteLine("\r\nCompareClasses:");
  AClass s1 = new AClass(10);
  AClass s2 = new AClass(10);
  Console.WriteLine("AClass.Equals(AClass) ? " + ((s1.Equals(s2)) ? "Yes" : "No"));
  Console.WriteLine("s1 == s2 ? " + ((s1 == s2) ? "Yes" : "No"));
}

This method returns:

No
No

This is as expected--the instances s1 and s2 do not equal each other.

Step 2: Overriding Equals

public class AnEqualsClass
{
  public readonly int i;

  private AnEqualsClass() { }

  public AnEqualsClass(int i)
  {
    this.i = i;
  }

  public override bool Equals(object obj)
  {
    bool ret = false;
    AnEqualsClass s = obj as AnEqualsClass;

    if (s == null)
    {
      ret = false;
    }
    else
    {
      ret = i == s.i;
    }

    return ret;
  }

  /// <summary>
  /// Avoid the compiler warning by implementing this method.
  /// We need this method for Equals to work with generics like Lists and Dictionaries.
  /// </summary>
  public override int GetHashCode()
  {
    // This is very important!
    // We return the hash code of our field, not the base algorithm.
    // The base algorithm returns different values for different instances.
    return i.GetHashCode();
  }
}

The above code illustrates the minimum requirements for overriding the Equals method, which also includes overriding the GetHasCode() method.
IMPORTANT: If we omit the GetHashCode() override, then our class does not work with collections!

static void CompareEqualsClasses()
{
  Console.WriteLine("\r\nCompareEqualsClasses:");
  AnEqualsClass s1 = new AnEqualsClass(10);
  AnEqualsClass s2 = new AnEqualsClass(10);
  Console.WriteLine("AnEqualsClass.Equals(AnEqualsClass) ? " + 
      ((s1.Equals(s2)) ? "Yes" : "No"));
  Console.WriteLine("s1 == s2 ? " + ((s1 == s2) ? "Yes" : "No"));
}

The above test returns:

Yes
No

This verifies that we are now comparing the instances by value, but the == operator is still comparing by reference.

Step 3: Implementing The operator== Method

public class AnOperatorEqualClass
{
  private readonly int i;

  public int I
  {
    get { return i; }
  }

  private AnOperatorEqualClass() { }

  public AnOperatorEqualClass(int i)
  {
    this.i = i;
  }

  public static bool operator ==(AnOperatorEqualClass s1, AnOperatorEqualClass s2)
  {
    bool ret = false;

    if (((object)s1 != null) && ((object)s2 != null))
    {
      ret = s1.i == s2.i;
    }

    return ret;
  }

  /// <summary>
  /// If one is defined, the other is required.
  /// </summary>
  public static bool operator !=(AnOperatorEqualClass s1, AnOperatorEqualClass s2)
  {
    return !(s1 == s2);
  }

  /// <summary>
  /// Also this is required!
  /// </summary>
  public override bool Equals(object obj)
  { 
    bool ret = false;
    AnOperatorEqualClass s = obj as AnOperatorEqualClass;

    if (s==null)
    {
      ret = false;
    }
    else
    {
      ret = i == s.i;
    }

    return ret;
  }

  /// <summary>
  /// Avoid the compiler warning by implementing this method.
  /// </summary>
  public override int GetHashCode()
  {
    // This is very important!
    // We return the hash code of our field, not the base algorithm.
    // The base algorithm returns different values for different instances.
    return i.GetHashCode();
  }
}

In the above code, the operator== method is implemented, which also requires that the operator!= method also be implemented. In fact, providing the operator== implementation requires that the Equals() method be overridden, which then also recommends (and this is a strong recommendation) that the GetHasCode() method also be overridden.

In the operator== method, we have the following code:

if (((object)s1 != null) && ((object)s2 != null))

Why are s1 and s2 cast to objects? The comparison s1 != null would call the operator!= method, which in turn would call the operator== method, until a stack overflow occurs. By casting s1 and s2 to object, the explicit operator!= method is not called, avoiding the infinite recursion that otherwise will occur.

static void CompareOperatorEqualClasses()
{
  Console.WriteLine("\r\nCompareOperatorEqualClasses:");
  AnOperatorEqualClass s1 = new AnOperatorEqualClass(10);
  AnOperatorEqualClass s2 = new AnOperatorEqualClass(10);
  Console.WriteLine("AnOperatorEqualsClass.Equals(AnOperatorEqualsClass) ? " 
         + ((s1.Equals(s2)) ? "Yes" : "No"));
  Console.WriteLine("s1 == s2 ? " + ((s1 == s2) ? "Yes" : "No"));
}

The above test code returns:

Yes
Yes

This illustrates that we now are comparing by value using both Equals() and operator==.

Collections

The following code explores how these classes respond as keys in collections.

The Basic Class

static void IndexClasses()
{
  Console.WriteLine("\r\nIndexClasses:");
  List<AClass> list = new List<AClass>();
  AClass s1 = new AClass(10);
  list.Add(s1);
  AClass s2 = new AClass(10);
  Console.WriteLine("List contains s2 : " + list.Contains(s2));
}

This test returns...

False

... as the instances are, as expected, being compared by reference.

Similarly, for a dictionary:

static void DictionaryClasses()
{
  Console.WriteLine("\r\nDictionaryClasses:");
  Dictionary<AClass, int> dict = new Dictionary<AClass, int>();
  AClass s1 = new AClass(10);
  dict[s1] = 1;
  AClass s2 = new AClass(10);
  Console.WriteLine("Dictionary contains s2 : " + dict.ContainsKey(s2));
}

The result is:

False

A Class that Implements Equals

static void IndexEqualsClasses()
{
  Console.WriteLine("\r\nIndexEqualsClasses:");
  List<AnEqualsClass> list = new List<AnEqualsClass>();
  AnEqualsClass s1 = new AnEqualsClass(10);
  list.Add(s1);
  AnEqualsClass s2 = new AnEqualsClass(10);
  Console.WriteLine("List contains s2 : " + list.Contains(s2));
}

The above test code, using the class that simply implements Equals() and GetHashCode(), returns:

True

This is also the case for when the class is used as a key in a generic Dictionary:

static void DictionaryEqualsClasses()
{
  Console.WriteLine("\r\nDictionaryEqualsClasses:");
  Dictionary<AnEqualsClass, int> dict = new Dictionary<AnEqualsClass, int>();
  AnEqualsClass s1 = new AnEqualsClass(10);
  dict[s1] = 1;
  AnEqualsClass s2 = new AnEqualsClass(10);
  Console.WriteLine("Dictionary contains s2 : " + dict.ContainsKey(s2));
}

The Importance of Getting GetHashCode Right

What happens when we change the GetHashCode() method:

public override int GetHashCode()
{
  // This is very important!
  // We return the hash code of our field, not the base algorithm.
  // The base algorithm returns different values for different instances.
  // REMOVED: return i.GetHashCode();  REPLACED WITH:
  return base.GetHashCode();
}

Here, we improperly implement the GetHashCode() method by calling the base method. While the List test still passes, the Dictionary test no longer passes! The Dictionary class is using the hash code to optimize its searches, whereas the List class is simply comparing each element in the list. The dictionary first compares hash codes, and since the hash codes of the two instances are not equal, the Contains() call returns false. This illustrates the importance of returning the correct hash code for instances that are to be compared as value.

Advanced Concepts

The following discusses the additional topics of using structs vs. classes and the best practice of making your classes immutable when they are used for value comparisons.

Why Not Use Struct?

public struct AStruct
{
  private readonly int i;

  public int I
  {
    get { return i; }
  }

  public AStruct(int i)
  {
    this.i = i;
  }
}

In the above code, this simple struct will pass all of our tests regarding equality. This seems like a simple solution to implementing, at a minimum, Equals() and GetHashCode() for classes. Why would we use a class rather than a struct? This question is important to ask because the answer (and therefore the question) isn't necessarily obvious. As Luca Bolognese points out on in his blog entry:

  • Structs cannot be null. A null state might have valuable meaning which, when using a struct, is lost.
  • You may still need to implement the == and != operators for code readability.
  • If you implement the == and != operators, you will have to implement Equals() and GetHasCode().
  • Structs are allocated on the stack, so when they are passed as arguments, they are copied, which may result in performance problems for large structs.
  • Structs always have a public default constructor that zeros all the fields. You will want a private default constructor, and the members of the class should be immutable (see below).
  • Structs cannot be abstract. This may impact your object oriented design.
  • Structs cannot extend other structs. This may impact your object oriented design.

These are all reasons to consider when deciding to use a struct or a class, and their impact on performance, design, and usage.

Make Your Classes Immutable

As suggested by MSDN (see references), classes that override operator== should be immutable. Immutable objects can be considered the same as long as they have the same value. Mutable objects should not be considered the same. For example, if object A and object B are equal in value at time T0, you can use either one of them in some process. However, if object B changes its value at time T1, then A and B are no longer equal, and there can be consequences if the process is using object B rather than object A. For this reason, in the example code in this article, the field "i" is marked "private readonly" and the property I provides only a getter method.

Conclusion

I hope this article has shed some light on the actually complex issues of comparing classes by value. This is a useful technique when instances that are identical in value are being used as a collection key as well as more commonplace equality tests.

References

History

  • 26th December, 2007: Initial post

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here