Exploring LINQ Query Operators and Differed Execution Plan

Zeeshan Jafar Hirani

4.12/5 (11 votes)

5 Mar 2008CPOL13 min read

170

An overview of LINQ: how LINQ leverages different execution, what support C# provides query over datasets and data tables (LINQ to dataset), etc.

Download source and demo - 1.51 MB

Introduction

In this article, I will cover the following topics:

How LINQ leverages differed execution and what benefits and impacts it can have to a developer.
Define what operators are differed and what operators force immediate execution of the query.
How to use LINQ with legacy collections that do not implement IEnumerable<T>; different query syntax is available in C# to query data.
What is an expression tree?
Partial methods under the covers, with an example.
How to return projections from a method.
What support C# provides to query over datasets and data tables (LINQ to dataset).
Implications of anonymous types.
Query operators such as SelectMany, Concat, OrderBy, AsEnumerable, defaultifempty and how to do left joins in LINQ to SQL.

Differed and Immediate Execution

By default, LINQ query expressions are not evaluated until you iterate over the contents. The benefit of this differed execution allows the same LINQ query to be executed multiple times, giving you the latest results. However, there would be times when differed execution behavior would not be acceptable due to performance reasons of the query being executed every time. To prevent this behavior, the .NET Framework defines a number of extension methods such as ToArray<T>(), ToDictionary<TSource, TKey>() and ToList<T>() to capture the results in a strongly typed container which would not reflect new changes made to the original collection. Here is an example:

public static void DifferedAndImmediateExecution()
    {
        int[] numbers = { 1,  2,  3,  4,  5,  6 };
        var lessthan4 = from n in numbers
                        where n < 4
                        select n;

        Console.WriteLine("Original array with items less than 4");
        foreach (int n in lessthan4)
        {
            Console.WriteLine("{0} < 4", n);
        }
        //differed execution
        numbers[2] = 7;//assigning new value
        Console.WriteLine(
            "Results based on differed execution after change number[2] = 7");
        foreach (int n in lessthan4)
        {
            Console.WriteLine("{0} < 4",  n);
        }

        //immediate execution
        numbers = new int[] { 1,  2,  3,  4,  5,  6 };
        var lessthan4immediate = numbers.Where(n => n < 4).Select(n => n).ToArray<int>();
        numbers[2] = 7;//assigning new value
        Console.WriteLine(
            "Results based on immediate execution after change number[2] = 7");
        foreach (int n in lessthan4immediate)
        {
            Console.WriteLine("{0} < 4",  n);
        }
    }

Figure 1

As you should notice, new changes are reflected on differed queries whereas new changes are not reflected on immediate execution because the results are cached in a strongly typed container. It may appear in the example that the query is happening when lessthan4 is initialized. However, the query does not run until you iterate over the collection. One of the downsides I have observed with differed execution is that bugs are harder to trace because enumerating the query could happen much further down in your code. This is where the exception would be thrown. However, you may forget that it is the original query that is the problem. So if differed execution is not the desired behavior you want, use the non-deferred extension methods like ToArray, ToList, ToDictionary or ToLookup to force the query to be executed. Let's see an example for that:

public static void DelayedExceptions()
    {
        string[] blob = {"Resharper", "is", "great"};
        IEnumerable<string> sentence = blob.Select(s => s.Substring(0,  5));
        foreach (string word in sentence)
        {
            Console.WriteLine(word);
        }
    }

You will notice that when you loop the second time is when you get an exception of rangeoutofboundexception instead of getting an exception when you declared the query. Here are the list of operators in order, showing which are differed and which are not.

Operator	Purpose	Deferred
`Distinct`	Set	Yes
`ElementAt`	Element
`ElementAtOrDefault`	Element	Yes
`Empty`	Generation	Yes
`Except`	Set	Yes
`First`	Element
`FirstORDefault`	Element
`GroupBy`	Grouping	Yes
`GroupJOin`	Join	Yes
`Intersect`	Set	Yes
`Join`	Join	Yes
`Last`	Element
`LastOrDefault`	Element
`LongCount`	Aggregate
`Max`	Aggregate
`Min`	Aggregate
`OfType`	Conversion	Yes
`OrderBy`	Ordering	Yes
`OrderByDescending`	Ordering	Yes
`Range`	Generation	Yes
`Reverse`	Ordering	Yes
`Repeat`	Generation	Yes
`Select`	Projection	Yes
`SelectMany`	Projection	Yes
`SequenceEqual`	Equality
`Single`	Element
`SingleOrDefault`	Element
`Skip`	Partitioning	Yes
`SkipWhile`	Partitioning	Yes
`Sum`	Aggregate
`Take`	Partitioning	Yes
`TakeWhile`	Partitioning	Yes
`ThenBy`	Ordering	Yes
`ThenByDescending`	Ordering	Yes
`ToDictionary`	Conversion
`ToArray`	Conversion
`ToList`	Conversion
`ToLookup`	Conversion
`Union`	Set	Yes
`Where`	Restriction	Yes

Support for Non-generic Collections.

Most of the extension methods work with IEnumerable<T>. However, there are legacy collections in .NET such as Collection and ArrayList that do not implement IEnumerable<T>. Thankfully, by using the generic Enumerable.OfType<T>() or Cast operator, it is possible to iterate over non-generic collections. OfType<T>() is one of the few extension methods that does not extend generic types and allows you to specify the type. One common use of OfType is to specify the right type when building LINQ queries. This is so that you can get design-time support for building queries and the compiler can check for errors. Another great use of OfType would be to filter the items from an array, excluding items that are not of the right type, such as ArrayList. This would allow you to add Books as well as a Car. In order to pull only Books and no Car, you can use OfType to filter the collection to the right type.

public class Book
{
    public string Title { get; set; }
    public string Author { get; set; }
}
public class Car
{
    public string Make { get; set; }
    public string Model { get; set; }
}
public static void OfTypeExtesionMethod()
    {
        ArrayList books = new ArrayList()
        {
            new Book{Title="ASP.NET 2.0 Website Programming: Problem -
                 Design - Solution",
                Author="Marco Bellinaso"},
            new Book{Title="Pro ASP.NET 3.5 in C# 2008,  Second Edition",
                Author="MacDonald and Mario Szpuszta"},
            new Book{Title="ASP.NET 2.0 Unleashed", Author="Stephen Walther"},
            new Book{Title="ASP.NET 3.5 Unleashed ", Author="Stephen Walther"},
            new Car{Make="Toyota", Model="Corolla"}
        };
        IEnumerable<Book> bystephen = from book in books.OfType<Book>()
                                      where book.Author == "Stephen Walther"
                                      select book;
        foreach (Book book in bystephen)
        {
            Console.WriteLine("Book {0} by {1}", book.Title, book.Author);
        }
    }

Figure 2

public static void CastOperator()
    {
        ArrayList list = new ArrayList();
        list.Add(1);
        list.Add(2);
        list.Add("ASP.NET 3.5 Unleashed");


        IEnumerable<int> numbers = list.Cast<int>();
        foreach (int number in numbers)
        {
            Console.WriteLine(number);
        }
    }

When you run the cast operator, you get an exception because the cast operator tries to cast every element in the collection to the specified type. If it does not cast, it will throw an exception. In contrast, the OfType operator will return objects that can be cast to the right type and skips the rest from the collection.

Query Syntax

The .NET Framework version 3.5 allows querying data using LINQ in a variety of different ways:

Extension methods and Lambda expressions
Query syntax
Generic delegate
Enumerable type
Expression tree
Anonymous methods

public static void QuerySyntax()
    {
        int[] numbers = { 1,  2,  3,  4,  5,  6 };
        //using query syntax
        var usingquerysntax = from n in numbers
                            where n < 5
                            select n;
        //Extension Methods and Lamda expressions
        var usingextensionmethods = numbers.Where(n => n < 5).Select(n => n);

        //using IEnumerable<T>

        Enumerable.Where(numbers,  n => n < 5).Select(n => n);

        //using generic delegate
        Func<int,  bool> where = n => n < 5;
        Func<int,  int> select = n => n;
        var usinggenericdelegate = numbers.Where(where).Select(select);

        //using expressions
        Expression<Func<int,  bool>> where1 = n => n < 5;
        Expression<Func<int,  int>> select1 = n => n;
        var usingexpression = numbers.Where(where).Select(select);

        //anonymous methods
        var usinganonymousmethods = numbers.Where(delegate(int n) { return n < 5; })
                                    .Select(delegate(int n) { return n; });
    }

The above code would give the same result. Key things to consider are that query syntax is simply a nicer way to execute extension methods. Wherever a delegate is required, lambdas, generic delegates or an anonymous method can be passed.

Expression Tree

An expression tree is an efficient data representation of a query's lambda expression. In terms of LINQ to SQL, the expression tree gets converted to SQL to ensure that filtering and ordering is performed on the server instead of bringing the data down and applying filtering and ordering on .NET objects. For lambda expressions, the compiler can generate IL code or an expression tree. If the operator accepts a method delegate, IL code is emitted. If an operator accepts an expression of a method delegate, the expression tree is returned. Let's look at the two different implementations of Select: one defined in System.Linq.Enumerable and the other defined in System.Linq.Queryable.

public static IEnumerable<TResult> Select<TSource,
    TResult>(this IEnumerable<TSource> source,  Func<TSource,  TResult> selector);
public static IQueryable<TResult> Select<TSource,
    TResult>(this IQueryable<TSource> source,
    Expression<Func<TSource,  TResult>> selector);

As you can see, LINQ to objects generate IL code because it takes in a generic delegate, whereas a LINQ to SQL implementation takes an expression tree that gets converted to SQL to be executed by SQL server.

Partial Methods

By defining a method partial, it allows you to prototype a method in one file and implement it in another file. They are equivalent to lightweight events in C#. Basically you can define a prototype in one partial class and define actual implementation in another partial class. If there is no implementation found, no IL code is emitted in the compiled assembly. Following are the important facts of partial methods:

Partial methods can only be defined within a partial class.
Partial methods must return a void.
Partial methods can be static or instance.
Partial methods can have parameters except out modifier.
Partial methods are always private.

public partial class PartialTest
{
    partial void Partial(int total);
    public void TestPartial()
    {
        int total = 0;
        Console.WriteLine("Total before calling Partial method {0}", total);
        Partial(++total);
        Console.WriteLine("Total after calling Partial method {0}",  total);
    }
}
PartialTest test = new PartialTest();
test.TestPartial();

Figure 3

As you can see, there was no implementation defined for partial methods. Therefore the value assigned to the total equalled 0, meaning the Partial method never got called.

Returning Projections From Methods

You may have noticed that the above code made use of an implicit type variable by using var, which infers the type. However, implicit type cannot be used to define parameters or return the values or the field of a class type. So, one way to return data would be to stop using implicit types and return a strongly typed object such as IEnumerable<Book> or IEnumerable<string>. However, there are cases when you want to return projections and, in those cases, you do not know the actual data type and the use of var keyword is mandatory. In those cases, you can return a standard CLR array using the ToArray<T>() extension method.

public static Array ReturnProjections()
    {
        Book[] books =
        {
            new Book{Title="ASP.NET 2.0 Unleashed", Author="Stephen Walther"},
            new Book{Title="ASP.NET 3.5 Unleashed ", Author="Stephen Walther"}
        };
        var projections = from book in books
                         select new { book.Title };
        return projections.ToArray();
    }

Notice that we are not specifying the underlying type in the return type for ToArray because we do not know the underlying type.

Array books = ReturnProjections();
        foreach (object o in books)
        {
            Console.WriteLine(o);
        }

LINQ Over Dataset

Out-of-the-box, datasets, datatables and dataviews do not have the necessary infrastructure to work in a LINQ query. This is where Microsoft introduced the new extension called System.Data.DataSetExtensions.dll, which includes DataTable extensions, DataRow extensions and TypedTableBasedExtesions. To make a data table LINQ aware, you simply need to call the AsEnumerable extension, which will return an EnumerableRowCollection consisting of DataRows. Another very useful extension method is DataRowExtensions.Field<T>(), which allows you to do safe casting in a LINQ query and helps prevent runtime exceptions. Once you apply the filter on a datatable using the LINQ query, the result is simply EnumerableRowCollection. In order to return back a table, we can use the new copytodatatable extension method on the result.

public static void DataSetExample(DataTable books)
    {
        var implicittype = from book in books.AsEnumerable()
                           where book.Field<string>("Author") == "Stephen Walther"
                           select book;
        var datatable = implicittype.CopyToDataTable();
    }

Anonymous Types

The C# 3.0 language includes the ability to dynamically create new unnamed classes. This type of class is known as an anonymous type. An anonymous type has no name and is generated by the compiler based on the initialization of the object being instantiated. Since the class has no type name, any variable assigned to an object of an anonymous type must have some way to declare it. This is the purpose of the new C# 3.0 var keyword. Anonymous types allow you to define class on-the-fly using object initialization syntax and assigning to var keyword. At compile time, the C# compiler will generate a unique class which is not accessible from your code. All anonymous classes derive from System.Object and therefore inherit all the properties and methods of the object class. Anonymous classes are provided with overridden versions of Equals(), GetHashCode and ToString(). The ToString implementation for an anonymous book class would look something like this:

public override string ToString()
    {
        StringBuilder builder = new StringBuilder();
        builder.Append("{ BookTitle = ");
        builder.Append(this.<BookTitle>i__Field);
        builder.Append(",  BookAuthor = ");
        builder.Append(this.<BookAuthor>i__Field);
        builder.Append(" }");
        return builder.ToString();
    }

GetHashCode() computes the hash by passing in the member variable of each anonymous type to EqualityComparer<T>.Therefore two anonymous types will return the same hashcode if they have the same set of properties and are assigned the same values. The Equals and == implementations for anonymous types are done differently. When you compare two anonymous classes using Equals, it makes use of a value-based comparison by testing the values for each property of the object with other anonymous types. However, with ==, it checks the reference to see if the two references are equal. Let's see an example of that:

public static void AnonymousTypeSemantics()
    {
        var book1 = new { BookTitle = "ASP.NET 3.5 Unleashed",
            BookAuthor = "Stephen Walther" };
        var book2 = new { BookTitle = "ASP.NET 3.5 Unleashed",
            BookAuthor = "Stephen Walther" };

        //check for ==
        Console.WriteLine("Checking if book1 == book2");
        if (book1 == book2)
        {
            Console.WriteLine("Same object");
        }
        else
        {
            Console.WriteLine("Not same object");
        }
        //check for equals
        Console.WriteLine("Checking if book1.Equals(book2)");
        if (book1.Equals(book2))
        {
            Console.WriteLine("Same object");
        }
        else
        {
            Console.WriteLine("Not same object");
        }
    }

The result is:

Figure 4

Standard Query Operators

The list of standard query operators is huge. I will cover a few of them that I found interesting and that require a bit of understanding.

Select Many and Concat Operators

The SelectMany operator is used to create one to many projections. The SelectMany operator will return zero or more inputs for every input element. The prototype is as follows:

public static IEnumerable<S> SelectMany<T,  S>(
this IEnumerable<T> source,
Func<T,  IEnumerable<S>> selector);

The operator takes an input sequence of T and a selector delegate that takes an element of type T. It returns an IEnumerable of S, an intermediate output sequence. Let's see an example:

public static void SelectManyOperator()
    {
        string[] items = { "this",  "is",  "a",  "test" };
        IEnumerable<char> characters = items.SelectMany(s => s.ToCharArray());
        foreach (char character in characters)
        {
            Console.WriteLine(character);
        }
    }

Figure 5

As you can see from the result, for every single input of word, the select many operator yields an array of characters. The select many operator concatenates each of those arrays of characters into a single sequence. The concat operator allows multiple sequences to be merged into a single sequence. I have found that both the concat and selectmany operators can be used interchangeably. In fact, the main difference I have found is that the selectmany operator is more flexible and allows you to concat more than two sequences (an array of sequences), as compared to concat, which only works on two sequences. Here is an example illustrating the merging of two sequences together, using both concat and selectmany:

public static void ConcatAndSelectManyOperator()
    {
        string[] books = {
                           "ASP.NET 2.0 Website Programming: Problem -
                            Design - Solution",
                           "Pro ASP.NET 3.5 in C# 2008,  Second Edition",
                           "ASP.NET 2.0 Unleashed",
                           "ASP.NET 3.5 Unleashed "
                         };
        IEnumerable<string> concatbooks = books.Take(2).Concat(books.Skip(2));
        IEnumerable<string> selectmanybooks = new[]{
                                                      books.Take(2),
                                                      books.Skip(2)
                                                  }.SelectMany(b => b);

    }

Using concat or selectmany would yield the same results, but the advantage of selectmany is that it allows you to merge more than two sequences to return a single sequence back.

OrderBy Operators

The order operator allows collections to be ordered using orderby, orderbydescending, thenby and thenbydescending. Here is what the prototype looks like:

public static IOrderedEnumerable<T> OrderBy<T,  K>(
this IEnumerable<T> source,
Func<T,  K> keySelector)
where
K : IComparable<K>

The operator takes an IEnumerable<T> and orders the collection based on the key selector that will return the key value for each element in the sequence. The only important point that is worth mentioning is that the sorting of orderby and orderbydescending is considered unstable, meaning that if two elements return the same key based on the key selector, the order of the output could be the maintained or could be different. You should never rely on the order of elements coming out of an orderby call other than the field specified in the orderby clause.

Another interesting point that I found was that orderby, orderbydescending and orderbyascending take as an input source IEnumerable<T>, as you can see from the prototype. However, the return type is IORderedEnumerable<T>. The problem with that is if you want to order by more than 1 column, you cannot pass IOrderedEnumerable<T> since orderby takes IEnumerable<T>. The solution to this is to use the ThenBy or ThenByDescending operators, which take IOrderedEnumerble<T>. Let's proceed to an example:

public static void OrderByOperators()
    {
        Book[] books =
        {
            new Book{Title="ASP.NET 2.0 Website Programming: Problem -
                Design - Solution",
                Author="Marco Bellinaso"},
            new Book{Title="ASP.NET 2.0 Unleashed", Author="Stephen Walther"},
            new Book{Title="Pro ASP.NET 3.5 in C# 2008,  Second Edition",
                Author="MacDonald and Mario Szpuszta"},
            new Book{Title="ASP.NET 3.5 Unleashed ", Author="Stephen Walther"},
        };
        IEnumerable<Book> orderedbooks =
              books.OrderBy(b => b.Author).ThenBy(b => b.Title);

    }

In the above example, I make use of both orderby and thenby. Since orderby only works on IEnumerable<T>, I make use of the thenby extension method as well. The second prototype for orderby takes in Icomparable and does not require the return type of the keyselector delegate to return a Collection that implements IComparable. Here is what the prototype looks like:

public static IOrderedEnumerable<T> OrderBy<T,  K>(
this IEnumerable<T> source,
Func<T,  K> keySelector,
IComparer<K> comparer);
interface IComparer<T> {
int Compare(T x,  T y);
}

The compare method will return a greater int if the second argument is greater than the first, equal to zero if both arguments are equal and less than zero if the first argument is less than the second. Let's see an example of that:

public class LengthComparer : IComparer<string>
{
    public int Compare(string title1,  string title2)
    {
        if (title1.Length < title2.Length) return -1;
        else if (title1.Length > title2.Length) return 1;
        else return 0;
    }
}

I have created a custom comparer which compares the book title based on the title's length. When the length of the first title is less than the second title's length, I return -1. If it's greater, I return 1. Otherwise, I return 0 for being equal. This is how I make use of the comparer in my LINQ query.

public static void OrderByUsingCustomComparer()
    {
        Book[] books =
        {
            new Book{Title="ASP.NET 2.0 Website Programming: Problem -
                 Design - Solution",
                Author="Marco Bellinaso"},
            new Book{Title="ASP.NET 2.0 Unleashed", Author="Stephen Walther"},
            new Book{Title="Pro ASP.NET 3.5 in C# 2008,  Second Edition",
                Author="MacDonald and Mario Szpuszta"},
            new Book{Title="ASP.NET 3.5 Unleashed ", Author="Stephen Walther"},
        };
        IEnumerable<Book> orderedbooks = books.OrderBy(b => b.Title,
            new LengthComparer());
    }

AsEnumerable Operator

I found the AsEnumerable operator to be really important in understanding where the query gets executed, meaning is it going to get converted to SQL and the query would be performed on SQL server or LINQ to objects would be used and query would be performed in memory. The ideal use I have found for AsEnumerable would be when I know that a certain functionality is not available in SQL server, I can perform part of the query using LINQ to SQL (Iqueryable) and the rest executed as LINQ to objects (IEnumerable<T>). Basically, AsEnumerable is a hint to perform this part of the execution using LINQ to objects. This is how the prototype looks:

public static IEnumerable<T> AsEnumerable<T>(
this IEnumerable<T> source);

The prototype operates on the source of IEnumerable<T> and also returns an IEnumerable<T>. This is because standard query operators operate on IEnumerable<T>, whereas LINQ to SQL operates on IQueryable<T>, which also happens to implement IEnumerable<T>. So when you execute an operator like on an IQueryable <T> (domain objects), it uses a LINQ to SQL implementation for the where clause. As a result, the query gets executed on SQL Server. But what if we knew in advance that a certain operator would fail on SQL Server since SQL Server has no implementation for it. It's good to use the AsEnumerable operator to tell the query engine to perform this part of the query in memory using LINQ to objects. Let's see an example:

public static void AsEnumerableExample()
    {
        NorthwindDataContext db = new NorthwindDataContext();
        var firstproduct = (from product in db.Products
                       where product.Category.CategoryName == "Beverages"
                       select product
                       ).ElementAt(0);
        Console.WriteLine(firstproduct.ProductName);
    }

When you run this query, it would throw an exception saying that elementat is not supported because SQL Server does not know how to execute elementAt. In this case, when I add as enumerable, the query would execute fine as follows:

public static void AsEnumerableExample()
    {
        NorthwindDataContext db = new NorthwindDataContext();
        var firstproduct = (from product in db.Products
                       where product.Category.CategoryName == "Beverages"
                       select product
                       ).AsEnumerable().ElementAt(0);
        Console.WriteLine(firstproduct.ProductName);
    }

DefaultIfEmpty

The DefaultIfEmpty operator returns a default element if the input sequence is empty. If the input sequence is empty, the DefaultIfEmpty operator returns a sequence with a single element of default (T) which, for reference types, is null. Furthermore, the operator also allows you to specify the default operator that will be returned.

public static void DefaultIfEmptyExample()
    {
        string[] fruits = { "Apple",  "pear",  "grapes",  "orange" };
        string banana = fruits.Where(f => f.Equals("Banana")).First();
        Console.WriteLine(banana);
    }

The above example throws an exception because the first operator requires that sequence not be empty. Therefore if we were to use defaultifEmpty, this is how it would look:

public static void DefaultIfEmptyExample1()
    {
        string[] fruits = { "Apple",  "pear",  "grapes",  "orange" };
        string banana =
            fruits.Where(f => f.Equals("Banana")).DefaultIfEmpty("Not Found").First();
        Console.WriteLine(banana);
    }

Another interesting use of DefaultIfEmpty is to perform a left outer join using GroupJoin. Here is an example that illustrates that:

public class Category
{
    public string CategoryName { get; set; }
}
public class Product
{
    public string ProductName { get; set; }
    public string CategoryName { get; set; }
}

public static void LeftOuterJoin()
    {
        Category[] categories = {
                                    new Category{CategoryName="Beverages"},
                                    new Category{CategoryName="Condiments"},
                                    new Category{CategoryName="Dairy Products"},
                                    new Category{CategoryName="Grains/Cereals"}
                                };
        Product[] products = {
                                 new Product{ProductName="Chai",
                                     CategoryName="Beverages"},
                                 new Product{ProductName="Northwoods Cranberry Sauce",
                                     CategoryName="Condiments"},
                                 new Product{ProductName="Butter",
                                     CategoryName="Dairy Products"},
                             };
        var prodcategory =
            categories.GroupJoin(
                                products,
                                c => c.CategoryName,
                                p => p.CategoryName,
                                (category,  prodcat) => prodcat.DefaultIfEmpty()
                                .Select(pc => new { category.CategoryName,
                                    ProductName = pc != null ? pc.ProductName : "No" })
                                ).SelectMany(s => s);

        foreach (var product in prodcategory)
        {
            Console.WriteLine("Category :{0},  Product = {1}",  product.CategoryName,
                product.ProductName);
        }
    }

In the example above, I am using left outer join to list all categories, regardless of whether they have any products or not.

History

30th January 2008 -- Original version posted
5th March 2008 -- Article updated

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)