Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / F#

Tracking Dependencies and Side-Effects Using Immutability

5.00/5 (1 vote)
7 Jul 2009CPOL7 min read 30.2K   51  
The Functional Programming for the Real World book explains essential concepts of the functional programming paradigm using examples in C# 3.0 and F#. In this article we look at one very important advantage of using one of the core functional concepts - immutability.
TitleReal World Functional Programming
AuthorTomas Petricek with Jon Skeet
PublisherManning
PublishedOctober 2009 (expected)
Electronic versionvia Manning Early Access program
ISBN-13978-1933988924
PriceUSD 49.99

This is a chapter excerpt from Real World Functional Programming authored by Tomas Petricek with Jon Skeet and published by Manning Publications. The content has been reformatted and edited to match the standard CodeProject article format. CodeProject readers can get 30% off any version of Functional Programming for the Real World at www.manning.com by using the code "codepro30" when checking out.

In the article Immutable Data Structures in C# and F# (excerpted from chapter 2) I introduced immutability, which is one of the essential concepts that form the functional programming paradigm. This approach to programming is becoming very important presently and has influenced the design of many .NET standard libraries including the LINQ project. Microsoft also announced that Visual Studio 2010 will include F# - a functional language - out of the box. However, you can use functional ideas when programming in any language, so in this article we'll use solely C#.

Using immutable data structures has many benefits. Perhaps the most discussed benefit is that immutability makes it easier to parallelize the program. When all the data is immutable, we can safely access them from multiple threads and there is no danger of race conditions, because none of the threads can modify the shared state. You can find more information about parallelization in chapter 14, but this is not the only consequence of immutability...

The benefit that we'll discuss in this article is that immutability provides great clarity to your code and you can use it to better understand what code does. If a function takes a list as an argument and returns a number, you can safely assume that it calculates the result based on the list content, but does not modify the list. We don't have to look at any code to reach that conclusion; we don't have to examine the implementation or any other functions that it calls. Let's start by looking at an example that demonstrates how easy it is to introduce errors when using mutable objects.

Using Mutable Data Structures

In the following listing you can see two functions that work with a sample collection storing names of places. We're using C# and storing the names in the standard List<T> type, which is mutable.

C#
// Working with places stored in List<T>

// Returns list containing names (1)
List<string> LoadPlaces() {
   return new List<string> { "Seattle", "Prague", 
      "New York", "Grantchester", "Cambridge" };
}

// Prints place with the longest name (2)
void PrintLongest(List<string> names) {
   // Start with the first place
   var longest = names[0];
   for(int i = 1; i < names.Count; i++)
      // A name is longer than the longest so far
      if (names[i].Length > longest.Length) longest = names[i];
   Console.WriteLine(longest);
}

// Print the count of multi-word names (3)
void PrintMultiWord(List<string> names) {
   // Remove all single-word names
   names.RemoveAll(s => !s.Contains(" "));
   Console.WriteLine("With space: {0}", names.Count);
}

The code first shows a function that loads some sample data (1). Next, we implement two processing functions. The first one (2) finds the place with the longest name; the second (3) determines how many names contain more than one word by removing any name that doesn't contain a space. Even though the method uses lambda function syntax, it's definitely not functional: the RemoveAll method modifies the names collection. If we wanted to use these functions later in our program, we could write the following code:

C#
// Prints 'With space: 1'
PrintMultiWord(LoadPlaces());
// Prints 'Grantchester'
PrintLongest(LoadPlaces());

This gives the correct results. However we're calling the LoadPlaces function twice, which seems to be unnecessary. If the function actually loaded data from a database, it would be better to retrieve the data only once for performance reasons. A simple refactoring is to call the function once and store the places in a local variable:

C#
var places = LoadPlaces();

// Prints 'With space: 1'
PrintMultiWord(places);
// Prints 'New York'
PrintLongest(places);

However, after this simple change we get incorrect results! If you've been following the source code carefully, you've probably already spotted the problem, but it's still subtle enough to potentially cause confusion and head-scratching. The problem is that List<T> is a mutable data structure and the function PrintMultiWord accidentally mutates it when it calls RemoveAll. When we call PrintLongest later in the code, the collection places contains only a single item, which is "New York". Now let's look why we couldn't make similar mistake if we used immutable data structures.

Using Immutable Data Structures

We implemented immutable lists for C# in chapter 3 and called the type FuncList<T>. We didn't implement all the standard .NET patterns, so the source code for this article (available as a download above) extends the type by implementing IEnumerable<T> and standard some LINQ methods including Where and Select. that return the results in the FuncList<T> type, so that a query that processes functional list also returns functional list as the result. The implementation is relatively simple, so we won't look at it in detail.

Using this improved version of the immutable list, we can rewrite the imperative code we've just seen. The LoadPlaces and PrintLongest methods don't change very much, so I've omitted them here. However, the PrintMultiWord method is more interesting: we can't use our previous strategy of using RemoveAll, because the FuncList type is immutable. Earlier we used this method to remove all single-word names from the collection. This side-effect made the method harder to reason about. Using immutable types, we can't introduce any side-effects in the same way, so if we want the same kind of results we have to be more explicit about it, as shown in the following listing:

C#
// Implementation of 'PrintMultiWord' using immutable list
FuncList<string> PrintMultiWord(FuncList<string> names) {
   // Create list containing names with spaces (1)
   FuncList<string> namesSpace = names.Where(s => s.Contains(" "));
   Console.WriteLine("With space: {0}", namesSpace.Count());
   // Return the list (2)
   return namesSpace;
}

We can't modifying modify a collection when we're working with immutable data structures, so the method first creates a new collection that contains only multi-word names (1). We've also made the side-effect from the previous implementation explicit, so the method now returns the new collection. Of course, it isn't really a side-effect at all now - it's just a return value. However, it achieves the same result of making the multi-word names list available to the caller if they want it (2).

Our first example was searching for the longest name from all the names and our second example (which printed "New York") returned the longest name containing a space. The next code sample shows how both can be implemented both of these examples using our new function.

C#
// Printing the longest and the longest multi-word name 
FuncList<string> pl = LoadImmutablePlaces();

PrintMultiWord(pl);
// Prints 'Grantchester'
PrintLongest(pl);

FuncList<string> pl1 = LoadImmutablePlaces();

// Returns filtered list (2)
var pl2 = PrintMultiWord(pl1);
// Prints 'New York'
PrintLongest(pl2);

This example demonstrates that using immutable data types makes it more obvious how constructs depend on each other. If we know that none of the functions can alter the data structure, it's easier to reason about the program and we can say what kinds of refactorings are valid. In the example on the left side, we could change the order of PrintMultiWord and PrintLongest and they would still print the same results (just in the opposite order). On the other hand, we can't change the order of the calls in the second version of the code above, because the value pl2 is result of the first call (2).

This means that when refactoring functional code, we can track dependencies of the computations more easily. We can see that a function depends on other calls if it takes a result of these calls as an argument. Because this is explicitly visible in the code, we can't make accidental refactoring errors because incorrectly modified code will not compile. This is also very useful when testing the code using unit tests.

Testing Functional Code

When using unit testing we create tests for every simple "unit" in the program to verify that it works correctly. If we wanted to test functions that used mutable data structures from the first listing, we would write two unit tests. The first one would verify whether PrintLongest prints the longest name and the second one would verify that PrintMultiWord prints the number of multi-word names in the list. These tests would very likely succeed, because that’s in fact what these methods do.

However, once we wanted to use these functions together, we could run into problems even though unit tests suggest that both of them work correctly. The problem is that simple test wouldn't test that the method doesn't have any side-effects.

On the other side, when using immutable data structures, the function takes all its input as arguments and everything it does is that it returns some result. When writing unit tests for functions like these, we only need to verify that the function gives expected result for the given arguments. This makes unit testing much easier and we'll more examples later in the book in chapter 11.

Summary

In this article we've seen one important benefit of using immutable data structures. This aspect is in practice very important when understanding existing code, when refactoring it and finally also when writing unit tests for it. When you read existing code written in the functional style, you can understand what a function does by looking at the type of its result. The only thing that a function can do is to return something, so this provides a useful clue about what a function does. The example we've seen in this article demonstrates benefits for refactoring - in the immutable version, we could safely optimize the code by removing one call to LoadImmutablePlaces. Finally, immutability is also essential for parallelization, but that is a topic for another article...

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)