Introduction
Many people hate Perl because of its frightening syntax. While it's true that many Perl scripts border on the unreadable, some of Perl's operations are extremely handy. Two of them are the grep
and map
list operators. I'll first introduce them in brief for those not familiar with Perl, and then delve into some thoughts on how to have similar operations easily available in C#.
The code here is based on .NET framework/C# 2.0. The new framework version (in beta, as of this writing) provides good support for similar functions. For 1.x series of .NET, implementing this sort of operators is possible, but hard to do in a generic manner; you'll be needing a good dose of interfaces, and it will be hard to integrate with the existing types (int
et al.). Also, code will be very messy because of the lack of anonymous method support. Although the code posted below is written using .NET Framework 2.0 Beta 1, it is expected to work as-is with the final release.
Background
Lists in Perl
First, we must introduce ourselves to the list in Perl. It's roughly the equivalent of a one-dimensional array in most other languages. Perl's relatively weak typing means that the list element can be anything; integers, strings, references ("pointers") to structures, and so on. In these examples, we'll only look at the basics.
The Perl list is denoted using a @
symbol in front of the variable name. Local variables are declared using the keyword my
, so that my @list = (1, 2, 3, 4);
roughly equals the C# statement int[] list = {1, 2, 3, 4};
. "Roughly", as Perl lists are flexible in size; you can throw in more elements or remove the old ones, and you can also add elements of any type you desire even if it's originally initialized with int
values only. However, these details don't really make a difference with the stuff we're handling here, so let's leave it at that.
The map operator
Perl's map
operator is actually syntactic sugar for a certain type of foreach
statement. What it does is evaluate a certain expression on every element of a list and push the result into a new list. The expression can be any valid Perl code; the array element being iterated is in a special variable called $_
.
For example, you can do the following:
my @numbers = (1, 2, 3, 4);
my @doubles = map { $_*2 } @numbers;
print @doubles; # prints 2468
The equivalent construct using foreach
in Perl would be like:
my @numbers = (1, 2, 3, 4);
my @doubles;
foreach (@numbers) {
push @doubles, $_*2;
}
print @doubles;
In C#, the foreach
approach would require a bit more code since you have to deal with the fact that the array isn't flexible in size by nature; you'll have to push the results into an ArrayList
or a List<T>
. We'll get back to this later on.
The grep operator
Grep
operator in Perl does about the same as its counterpart on the Unix command line: picks up list elements that match certain criteria. The syntax is very much similar to map
; an expression is specified. The difference comes here: if the result of the expression is true ("non-zero" -- in this sense, Perl treats booleans much like C/C++), the element is included in the resulting array.
my @names = ("John", "Mike", "Jane", "Adam");
my @oNames = grep { substr($_, 0, 1) eq 'J' } @names;
print @oNames;
With a little thought, I think you can figure out what the code above does. Yeah, it prints out those names whose first character ("1 character long substring from index position 0") equals 'J' - that is, John and Jane.
The C# approach
I've been longing for those two Perl operations for quite some time. When I first read the C# 2.0 spec, I realized the potential both generics and anonymous methods had for this purpose. However, I later noted that Framework 2.0 already contains fairly good support for these operations in the form of a couple of new methods. While the terse syntax of Perl will be missed, using map
and grep
like operations in C# 2.0 is almost trivial. Let's see a bit closer.
The map equivalent: ForEach
Typed arrays and the new List<t>
type containers provide a new static method in 2.0: ForEach
. It takes a typed array (or a list; I'll just talk about arrays in the following, but the same can mostly be applied for lists as well) and an Action type delegate as a parameter, and then runs the delegate for each array element. An action type delegate takes a single parameter of the list element's type and returns nothing. So, you can use ForEach
like this:
private static void PrintNumber(int num) {
Console.WriteLine("The number is " + num);
}
public static void Main() {
int[] list = {1, 2, 3, 4};
Array.ForEach(list, new Action<int>(PrintNumber));
}
One of the greater new C# enhancements is the ability to define methods anonymously. So we can actually get rid of the PrintNumber
method, too. The following code is equivalent with the snippet above:
public static void Main() {
int[] list = {1, 2, 3, 4};
Array.ForEach(
list,
delegate(int num) { Console.WriteLine("The number is " + num); }
);
}
The missing part here is the lack of return type in the Action delegate; you can't easily use it to construct a new list or array. So, let's construct some of our own tool code to help:
public delegate T MapAction<T>(T item);
public static T[] MapToArray<T>(T[] source,
MapAction<T> action) {
T[] result = new T[source.Length];
for (int i = 0; i < source.Length; ++i)
result[i] = action(source[i]);
return result;
}
So, we now have a MapAction
delegate, which is much like the framework's Action, but also returns an element of type T
. Then we have the MapToArray
method, which always returns an array of same type and size as the source parameter - but the elements contained are passed through the MapAction
handler, mutilating them in the way you wish. As a result, we can write:
public static void Main() {
int[] list = {1, 2, 3, 4};
int[] doubled = MapToArray(list,
delegate(int num) { return num*2; });
foreach (int i in doubled) Console.WriteLine(i);
}
... and have it print out 2, 4, 6 and 8. That's it! Except that this isn't as flexible as Perl's map
was; you still can't map the elements to a totally different type. Luckily, the changes required are pretty easy. Let's rewrite the MapAction
delegate and the MapToArray
method to work with different types:
public delegate DstType MapAction<SrcType, DstType>(SrcType item);
public static DstType[] MapToArray<SrcType, DstType>(
SrcType[] source,
MapAction<SrcType, DstType> action) {
DstType[] result = new DstType[source.Length];
for (int i = 0; i < source.Length; ++i)
result[i] = action(source[i]);
return result;
}
We now have a whole lot of two type parameters - which I've also named SrcType
and DstType
for clarity - in both the delegate and the MapToArray
method. This allows us to create maps from one type to another with arbitrarily complex operations:
public static void Main() {
string[] files = { "map.pl", "testi.cs" };
long[] fileSizes =
MapToArray<string,long>(
files,
delegate(string file) { return new FileInfo(file).Length; }
);
for (int i=0; i < files.Length; ++i)
Console.WriteLine(files[i] + ": " + fileSizes[i]);
}
Yep, that example maps an array of filenames (string
s) into an array of file sizes (long
s). Note that the complexity of that example exceeds C#'s ability to infer the generic types; you have to manually specify SrcType
and DstType
at the MapArray
call. That's not entirely bad, though - it does add some clarity to the code.
The grep equivalent: FindAll
After having gone through all the trouble of implementing the map operation, implementing a grep
equivalent is trivial. Both Array
s and List<T>
s have a FindAll
method, which works much like ForEach
, but takes a Predicate type delegate, which essentially is just a method that takes an item and returns true if it matches the criteria. So, to filter even numbers from an int
array, you just type:
int[] list = {1, 2, 3, 4, 5, 6};
int[] even = Array.FindAll(list,
delegate(int num) { return num%2 == 0; });
foreach (int i in even) Console.WriteLine(i);
... and yes, the code above prints 2, 4 and 6. That's it for grep
!
Concluding remarks
FindAll
and ForEach
are considerable improvements to the list/array handling mechanisms readily provided by the Framework. Combined with logic like the MapArray
method posted above, you can make even complex array operations relatively simple. Although the terseness (or ugliness or elegance - whatever you prefer) of Perl code cannot be reached simply by coding new methods, much of the same power can be wielded.
The map operations described above are far from perfect. They're restricted to arrays both on input and output. It would be fairly simple to make them swallow IEnumerable<T>
s, so you could give anything as an input. If you want the map operation to produce a list, you could write a separate MapToList
method. However, for most cases, the relatively simple operations described above will suffice. Also, FindAll
exists on both List<T>
and Array
s, so you don't have to hack around for that.
History
2004-07-23: Initial version released.