|
Title | Real World Functional Programming | Author | Tomas Petricek with Jon Skeet | Publisher | Manning | Published | October 2009 (expected) | Electronic version | via Manning Early Access program | ISBN-13 | 978-1933988924 | Price | USD 49.99 |
|
This is a chapter excerpt from Real World Functional Programming authored by
Tomas Petricek with Jon Skeet and published by Manning Publications. The content has been reformatted
and edited to match the standard CodeProject article format. CodeProject readers can get 30% off any
version of Functional Programming for the Real World at www.manning.com
by using the code "codepro30" when checking out.
In the article Immutable Data Structures in C# and F# (excerpted from chapter 2) I introduced immutability, which is one of the essential concepts that form the functional programming paradigm. This approach to programming is becoming very important presently and has influenced the design of many .NET standard libraries including the LINQ project. Microsoft also announced that Visual Studio 2010 will include F# - a functional language - out of the box. However, you can use functional ideas when programming in any language, so in this article we'll use solely C#.
Using immutable data structures has many benefits. Perhaps the most discussed benefit is that immutability makes it easier to parallelize the program. When all the data is immutable, we can safely access them from multiple threads and there is no danger of race conditions, because none of the threads can modify the shared state. You can find more information about parallelization in chapter 14, but this is not the only consequence of immutability...
The benefit that we'll discuss in this article is that immutability provides great clarity to your code and you can use it to better understand what code does. If a function takes a list as an argument and returns a number, you can safely assume that it calculates the result based on the list content, but does not modify the list. We don't have to look at any code to reach that conclusion; we don't have to examine the implementation or any other functions that it calls. Let's start by looking at an example that demonstrates how easy it is to introduce errors when using mutable objects.
Using Mutable Data Structures
In the following listing you can see two functions that work with a sample collection storing names of places. We're using C# and storing the names in the standard List<T>
type, which is mutable.
List<string> LoadPlaces() {
return new List<string> { "Seattle", "Prague",
"New York", "Grantchester", "Cambridge" };
}
void PrintLongest(List<string> names) {
var longest = names[0];
for(int i = 1; i < names.Count; i++)
if (names[i].Length > longest.Length) longest = names[i];
Console.WriteLine(longest);
}
void PrintMultiWord(List<string> names) {
names.RemoveAll(s => !s.Contains(" "));
Console.WriteLine("With space: {0}", names.Count);
}
The code first shows a function that loads some sample data (1). Next, we implement two processing functions. The first one (2) finds the place with the longest name; the second (3) determines how many names contain more than one word by removing any name that doesn't contain a space. Even though the method uses lambda function syntax, it's definitely not functional: the RemoveAll
method modifies the names collection. If we wanted to use these functions later in our program, we could write the following code:
PrintMultiWord(LoadPlaces());
PrintLongest(LoadPlaces());
This gives the correct results. However we're calling the LoadPlaces
function twice, which seems to be unnecessary. If the function actually loaded data from a database, it would be better to retrieve the data only once for performance reasons. A simple refactoring is to call the function once and store the places in a local variable:
var places = LoadPlaces();
PrintMultiWord(places);
PrintLongest(places);
However, after this simple change we get incorrect results! If you've been following the source code carefully, you've probably already spotted the problem, but it's still subtle enough to potentially cause confusion and head-scratching. The problem is that List<T>
is a mutable data structure and the function PrintMultiWord
accidentally mutates it when it calls RemoveAll
. When we call PrintLongest
later in the code, the collection places contains only a single item, which is "New York". Now let's look why we couldn't make similar mistake if we used immutable data structures.
Using Immutable Data Structures
We implemented immutable lists for C# in chapter 3 and called the type FuncList<T>
. We didn't implement all the standard .NET patterns, so the source code for this article (available as a download above) extends the type by implementing IEnumerable<T>
and standard some LINQ methods including Where
and Select
. that return the results in the FuncList<T>
type, so that a query that processes functional list also returns functional list as the result. The implementation is relatively simple, so we won't look at it in detail.
Using this improved version of the immutable list, we can rewrite the imperative code we've just seen. The LoadPlaces
and PrintLongest
methods don't change very much, so I've omitted them here. However, the PrintMultiWord
method is more interesting: we can't use our previous strategy of using RemoveAll
, because the FuncList
type is immutable. Earlier we used this method to remove all single-word names from the collection. This side-effect made the method harder to reason about. Using immutable types, we can't introduce any side-effects in the same way, so if we want the same kind of results we have to be more explicit about it, as shown in the following listing:
FuncList<string> PrintMultiWord(FuncList<string> names) {
FuncList<string> namesSpace = names.Where(s => s.Contains(" "));
Console.WriteLine("With space: {0}", namesSpace.Count());
return namesSpace;
}
We can't modifying modify a collection when we're working with immutable data structures, so the method first creates a new collection that contains only multi-word names (1). We've also made the side-effect from the previous implementation explicit, so the method now returns the new collection. Of course, it isn't really a side-effect at all now - it's just a return value. However, it achieves the same result of making the multi-word names list available to the caller if they want it (2).
Our first example was searching for the longest name from all the names and our second example (which printed "New York") returned the longest name containing a space. The next code sample shows how both can be implemented both of these examples using our new function.
FuncList<string> pl = LoadImmutablePlaces();
PrintMultiWord(pl);
PrintLongest(pl);
FuncList<string> pl1 = LoadImmutablePlaces();
var pl2 = PrintMultiWord(pl1);
PrintLongest(pl2);
This example demonstrates that using immutable data types makes it more obvious how constructs depend on each other. If we know that none of the functions can alter the data structure, it's easier to reason about the program and we can say what kinds of refactorings are valid. In the example on the left side, we could change the order of PrintMultiWord
and PrintLongest
and they would still print the same results (just in the opposite order). On the other hand, we can't change the order of the calls in the second version of the code above, because the value pl2
is result of the first call (2).
This means that when refactoring functional code, we can track dependencies of the computations more easily. We can see that a function depends on other calls if it takes a result of these calls as an argument. Because this is explicitly visible in the code, we can't make accidental refactoring errors because incorrectly modified code will not compile. This is also very useful when testing the code using unit tests.
Testing Functional Code
When using unit testing we create tests for every simple "unit" in the program to verify that it works correctly. If we wanted to test functions that used mutable data structures from the first listing, we would write two unit tests. The first one would verify whether PrintLongest
prints the longest name and the second one would verify that PrintMultiWord
prints the number of multi-word names in the list. These tests would very likely succeed, because that’s in fact what these methods do.
However, once we wanted to use these functions together, we could run into problems even though unit tests suggest that both of them work correctly. The problem is that simple test wouldn't test that the method doesn't have any side-effects.
On the other side, when using immutable data structures, the function takes all its input as arguments and everything it does is that it returns some result. When writing unit tests for functions like these, we only need to verify that the function gives expected result for the given arguments. This makes unit testing much easier and we'll more examples later in the book in chapter 11.
Summary
In this article we've seen one important benefit of using immutable data structures. This aspect is in practice very important when understanding existing code, when refactoring it and finally also when writing unit tests for it. When you read existing code written in the functional style, you can understand what a function does by looking at the type of its result. The only thing that a function can do is to return something, so this provides a useful clue about what a function does. The example we've seen in this article demonstrates benefits for refactoring - in the immutable version, we could safely optimize the code by removing one call to LoadImmutablePlaces
. Finally, immutability is also essential for parallelization, but that is a topic for another article...