LINQ: .NET Language-Integrated Query

Sumit Jain

3.53/5 (8 votes)

26 Aug 2007CPOL5 min read

General-purpose query added to the .NET Framework, LINQ, Lambda Expression, Extension Method

Part 1 of this series will discuss the following:

Introduction
Language Features Supporting LINQ
1. Extension Method
2. Lambda Expression
3. Local Variable Type Inference
4. Anonymous Type
5. Object Initialization

Introduction

So let's get started directly with the code:

int[] array = new int[] { 1, 2, 3,
4, 5, 6, 7, 8, 9, 10 };
var evenNumbers = from a in array
          where a % 2 == 0
          select a;
ObjectDumper.Write(evenNumbers);

The above declares an array of integers and prints only even numbers.

The output of the above code is as follows:

ObjectDumper is a utility class which uses reflection and does a Console.WriteLine(). ObjectDumper is a smart class which accepts any object, infers its type using reflection and prints the values to a stream. By default, the stream is console. So if I have a class Customer and it has a public field as Name, and I pass the object of this class to ObjectDumper, it prints in the following fashion Name=<customerName>. If I pass a collection, it iterates on all the items in the collection and prints their respective values. This class can be used for .NET 2.0 as well.

Now let's see how this code will look like to a C# compiler. Architects call this code as syntactic sugar. If you look at the code, there is no method call, no objects, and no "." Operator.

As soon as the C# compiler looks at the above code, it translates this code as shown below:

var evenNumbers = array.Where(a => 0 == a % 2).Select(a => a);

Now, as a developer I am happy to see some objects and a method call. But still I am confused as an int array does not have Where and Select methods. Has Microsoft added these new functions to this type? And what is "a => 0 == a % 2"?

Extension Method

Where and Select are extension methods. .NET 3.5, which is built over .NET 2.0, allows a developer to add her/his own functions in existing classes/types. But there are rules for adding an extension method. You can add a method to a type from a static class only. This static class should contain a static method which will be called as instance method on that type. Let's take an example of what I mean by that:

public static class MyExtensions
    {
        public static string Reverse(this string str)
        {
            StringBuilder sb = new StringBuilder(str.Length);
            for (int i = str.Length - 1; i >= 0; i--)
                sb.Append(str[i]);
            return sb.ToString();
        }
    }

Here we have added a Reverse function to String class. Now let's see how we can use this function:

string s = "Hello LINQ";
Console.WriteLine(s.Reverse()); //will print "QNIL olleH"

Things to note:

Reverse function is a static function and is called as an instance function
Reverse functions accept a string as an argument, whereas the call to the function is void
Note the function has a "this" for its first parameter.

When the compiler sees the call to the function which is not a member function of that type, it looks for an extension method and calls the function which is the closest match. This means if I have another namespace which has Reverse as an extension method, the compiler will either give an ambiguous error or call the first function closest to the function call. To avoid ambiguous error, you can explicitly mention the namespace class and method you want to invoke. In our case, when the compiler sees this call, it replaces it with the following:

Console.WriteLine(MyExtensions.Reverse(s));

Lambda Expression

Coming back to the following code:

var even = array.Where(a => 0 == a % 2).Select(a => a);

Now we know that Where and Select are extension methods. But what about "".

For a .NET 2.0 developer, the above code is equivalent to the code below:

var e = array.Where(delegate(int a)
	{ return 0 == a % 2; }).Select(delegate(int a) {return a;});

Where extension method looks as follows:

public delegate TR Func<T0, TR>(T0 a0);

public static IEnumerable<T> Where<T>(this IEnumerable<T> source,
	Func<T, bool> predicate)
{
     if (source == null || predicate == null)
         throw new ArgumentNullException();
     foreach (T item in source)
         if(predicate(item))
           yield return item;
}

This extension method says it is applicable for all IEnumerable<T> types. It iterates over all the items and returns only those types which satisfy a given condition (predicate).

Now let's look at "Where(a => 0 == a % 2).Select(a => a);". The arrow (=>) operator is introduced and the whole expression is called lambda expression. Let's see more examples on lambda expression:

public delegate T Func<A0, A1, T>(A0 arg0, A1 arg1);

Func<int, int, int> f = (x, y) => x * y;
ObjectDumper.Write(f(5, 6));

Output of the above code is 30.

What the compiler does is it infers the type of x and y which makes it type safe and then creates an anonymous delegate function and calls it. This makes more sense to object-oriented people.

Local Variable Type Inference

Again coming back to the main query:

var evenNumbers = from a in array
                 where a % 2 == 0
                  select a;

If you have not yet noted, the output of the query is assigned to var. This var does not correspond to object in Jscript where it means an object. In C#, its type is inferred during the type of assignment. You cannot do the following:

var unknowType = null;

This will result in compilation error.

So why is var important? If you look at most of the extension methods provided by Microsoft, they return IEnumerable<T>. So in your code, every time you write a query, you will also be writing another class encapsulating it. This is not a good idea. So temporarily you can use it using the var variable. Given that, you cannot pass var as a function argument. So if you want to use the result of the query outside the function, you will have to define your type unless you are selecting the whole item.

Anonymous Type

Let's look at a different query this time:

var contacts = from c in customers
           where c.State == "WA"
           select new { c.Name, c.Phone };

If you look at the select statement, here we are creating a new type altogether and we are not even specifying the type name. In this case, the compiler will create a new type which will have two public fields in it and the type of the fields will be inferred from the source type.

Note since we have not mentioned the type name, we will not be able to reuse this type. Also the scope of this type is the function in which it is used. Once the compiler creates the type, it will initialize the values with the values of the source.

Object Initialization

Referring back to the same query:

var contacts = from c in customers
           where c.State == "WA"
           select new { c.Name, c.Phone };

In this case, the compiler created an anonymous type but it did not create its constructor which takes two parameters and then initializes its fields. So how do the fields get initialized?

Let's take another example which will make things more clear:

public class Point
{
    private int x, y;
    public int X { get { return x; } set { x = value; } }
    public int Y { get { return y; } set { y = value; } }
}
Point a = new Point { X = 0, Y = 1 };

Now this code is as good as writing it as follows:

Point a = new Point();
a.X = 0;
a.Y = 1;

I guess the code speaks for itself.

Future Articles

Part 2 of this series will discuss LINQ to SQL
Part 3 of this series will discuss LINQ to XML

For more details, you can reach me at SumitkJain@hotmail.com.

History

27^th August, 2007: Initial post

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)