(untagged)

Implementing Perl-style list operations using C# 2.0

Jouni Heikniemi

0.00/5 (No votes)

22 Jul 2004

Instructions on implementing Perl map and grep with C# 2.0.

Introduction

Many people hate Perl because of its frightening syntax. While it's true that many Perl scripts border on the unreadable, some of Perl's operations are extremely handy. Two of them are the grep and map list operators. I'll first introduce them in brief for those not familiar with Perl, and then delve into some thoughts on how to have similar operations easily available in C#.

The code here is based on .NET framework/C# 2.0. The new framework version (in beta, as of this writing) provides good support for similar functions. For 1.x series of .NET, implementing this sort of operators is possible, but hard to do in a generic manner; you'll be needing a good dose of interfaces, and it will be hard to integrate with the existing types (int et al.). Also, code will be very messy because of the lack of anonymous method support. Although the code posted below is written using .NET Framework 2.0 Beta 1, it is expected to work as-is with the final release.

Background

Lists in Perl

First, we must introduce ourselves to the list in Perl. It's roughly the equivalent of a one-dimensional array in most other languages. Perl's relatively weak typing means that the list element can be anything; integers, strings, references ("pointers") to structures, and so on. In these examples, we'll only look at the basics.

The Perl list is denoted using a @ symbol in front of the variable name. Local variables are declared using the keyword my, so that my @list = (1, 2, 3, 4); roughly equals the C# statement int[] list = {1, 2, 3, 4};. "Roughly", as Perl lists are flexible in size; you can throw in more elements or remove the old ones, and you can also add elements of any type you desire even if it's originally initialized with int values only. However, these details don't really make a difference with the stuff we're handling here, so let's leave it at that.

The map operator

Perl's map operator is actually syntactic sugar for a certain type of foreach statement. What it does is evaluate a certain expression on every element of a list and push the result into a new list. The expression can be any valid Perl code; the array element being iterated is in a special variable called $_.

For example, you can do the following:

my @numbers = (1, 2, 3, 4);
my @doubles = map { $_*2 } @numbers;
print @doubles;     # prints 2468

The equivalent construct using foreach in Perl would be like:

my @numbers = (1, 2, 3, 4);
my @doubles;
foreach (@numbers) {
  push @doubles, $_*2;
}
print @doubles;

In C#, the foreach approach would require a bit more code since you have to deal with the fact that the array isn't flexible in size by nature; you'll have to push the results into an ArrayList or a List<T>. We'll get back to this later on.

The grep operator

Grep operator in Perl does about the same as its counterpart on the Unix command line: picks up list elements that match certain criteria. The syntax is very much similar to map; an expression is specified. The difference comes here: if the result of the expression is true ("non-zero" -- in this sense, Perl treats booleans much like C/C++), the element is included in the resulting array.

my @names = ("John", "Mike", "Jane", "Adam");
my @oNames = grep { substr($_, 0, 1) eq 'J' } @names;
print @oNames;

With a little thought, I think you can figure out what the code above does. Yeah, it prints out those names whose first character ("1 character long substring from index position 0") equals 'J' - that is, John and Jane.

The C# approach

I've been longing for those two Perl operations for quite some time. When I first read the C# 2.0 spec, I realized the potential both generics and anonymous methods had for this purpose. However, I later noted that Framework 2.0 already contains fairly good support for these operations in the form of a couple of new methods. While the terse syntax of Perl will be missed, using map and grep like operations in C# 2.0 is almost trivial. Let's see a bit closer.

The map equivalent: ForEach

Typed arrays and the new List<t> type containers provide a new static method in 2.0: ForEach. It takes a typed array (or a list; I'll just talk about arrays in the following, but the same can mostly be applied for lists as well) and an Action type delegate as a parameter, and then runs the delegate for each array element. An action type delegate takes a single parameter of the list element's type and returns nothing. So, you can use ForEach like this:

  private static void PrintNumber(int num) {
    Console.WriteLine("The number is " + num);
  }

  public static void Main() {
    int[] list = {1, 2, 3, 4};
    Array.ForEach(list, new Action<int>(PrintNumber));
  }

One of the greater new C# enhancements is the ability to define methods anonymously. So we can actually get rid of the PrintNumber method, too. The following code is equivalent with the snippet above:

  public static void Main() {
    int[] list = {1, 2, 3, 4};
    Array.ForEach(
      list, 
      delegate(int num) { Console.WriteLine("The number is " + num); } 
    );
  }

The missing part here is the lack of return type in the Action delegate; you can't easily use it to construct a new list or array. So, let's construct some of our own tool code to help:

  public delegate T MapAction<T>(T item);

  public static T[] MapToArray<T>(T[] source, 
                                  MapAction<T> action) {

    T[] result = new T[source.Length];
    for (int i = 0; i < source.Length; ++i)
      result[i] = action(source[i]);
    return result;
  }

So, we now have a MapAction delegate, which is much like the framework's Action, but also returns an element of type T. Then we have the MapToArray method, which always returns an array of same type and size as the source parameter - but the elements contained are passed through the MapAction handler, mutilating them in the way you wish. As a result, we can write:

  public static void Main() {
    int[] list = {1, 2, 3, 4};
    int[] doubled = MapToArray(list, 
                               delegate(int num) { return num*2; });
    
    foreach (int i in doubled) Console.WriteLine(i);
  }

... and have it print out 2, 4, 6 and 8. That's it! Except that this isn't as flexible as Perl's map was; you still can't map the elements to a totally different type. Luckily, the changes required are pretty easy. Let's rewrite the MapAction delegate and the MapToArray method to work with different types:

  public delegate DstType MapAction<SrcType, DstType>(SrcType item);

  public static DstType[] MapToArray<SrcType, DstType>(
                            SrcType[] source, 
                            MapAction<SrcType, DstType> action) {

    DstType[] result = new DstType[source.Length];
    for (int i = 0; i < source.Length; ++i)
      result[i] = action(source[i]);
    return result;
  }

We now have a whole lot of two type parameters - which I've also named SrcType and DstType for clarity - in both the delegate and the MapToArray method. This allows us to create maps from one type to another with arbitrarily complex operations:

  public static void Main() {
    string[] files = { "map.pl", "testi.cs" };
    long[] fileSizes = 
      MapToArray<string,long>(
        files, 
        delegate(string file) { return new FileInfo(file).Length; }
      );
    
    for (int i=0; i < files.Length; ++i) 
      Console.WriteLine(files[i] + ": " + fileSizes[i]);
  }

Yep, that example maps an array of filenames (strings) into an array of file sizes (longs). Note that the complexity of that example exceeds C#'s ability to infer the generic types; you have to manually specify SrcType and DstType at the MapArray call. That's not entirely bad, though - it does add some clarity to the code.

The grep equivalent: FindAll

After having gone through all the trouble of implementing the map operation, implementing a grep equivalent is trivial. Both Arrays and List<T>s have a FindAll method, which works much like ForEach, but takes a Predicate type delegate, which essentially is just a method that takes an item and returns true if it matches the criteria. So, to filter even numbers from an int array, you just type:

  int[] list = {1, 2, 3, 4, 5, 6};
  int[] even = Array.FindAll(list, 
                             delegate(int num) { return num%2 == 0; });
  foreach (int i in even) Console.WriteLine(i);

... and yes, the code above prints 2, 4 and 6. That's it for grep!

Concluding remarks

FindAll and ForEach are considerable improvements to the list/array handling mechanisms readily provided by the Framework. Combined with logic like the MapArray method posted above, you can make even complex array operations relatively simple. Although the terseness (or ugliness or elegance - whatever you prefer) of Perl code cannot be reached simply by coding new methods, much of the same power can be wielded.

The map operations described above are far from perfect. They're restricted to arrays both on input and output. It would be fairly simple to make them swallow IEnumerable<T>s, so you could give anything as an input. If you want the map operation to produce a list, you could write a separate MapToList method. However, for most cases, the relatively simple operations described above will suffice. Also, FindAll exists on both List<T> and Arrays, so you don't have to hack around for that.

History

2004-07-23: Initial version released.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here