Introduction
In this article, I will cover the following topics:
- How LINQ leverages differed execution and what benefits and impacts it can have to a developer.
- Define what operators are differed and what operators force immediate execution of the query.
- How to use LINQ with legacy collections that do not implement
IEnumerable<T>;
different query syntax is available in C# to query data. - What is an expression tree?
- Partial methods under the covers, with an example.
- How to return projections from a method.
- What support C# provides to query over datasets and data tables (LINQ to dataset).
- Implications of anonymous types.
- Query operators such as
SelectMany
, Concat
, OrderBy
, AsEnumerable
, defaultifempty
and how to do left join
s in LINQ to SQL.
Differed and Immediate Execution
By default, LINQ query expressions are not evaluated until you iterate over the contents. The benefit of this differed execution allows the same LINQ query to be executed multiple times, giving you the latest results. However, there would be times when differed execution behavior would not be acceptable due to performance reasons of the query being executed every time. To prevent this behavior, the .NET Framework defines a number of extension methods such as ToArray<T>()
, ToDictionary<TSource
, TKey>()
and ToList<T>()
to capture the results in a strongly typed container which would not reflect new changes made to the original collection. Here is an example:
public static void DifferedAndImmediateExecution()
{
int[] numbers = { 1, 2, 3, 4, 5, 6 };
var lessthan4 = from n in numbers
where n < 4
select n;
Console.WriteLine("Original array with items less than 4");
foreach (int n in lessthan4)
{
Console.WriteLine("{0} < 4", n);
}
numbers[2] = 7;
Console.WriteLine(
"Results based on differed execution after change number[2] = 7");
foreach (int n in lessthan4)
{
Console.WriteLine("{0} < 4", n);
}
numbers = new int[] { 1, 2, 3, 4, 5, 6 };
var lessthan4immediate = numbers.Where(n => n < 4).Select(n => n).ToArray<int>();
numbers[2] = 7;
Console.WriteLine(
"Results based on immediate execution after change number[2] = 7");
foreach (int n in lessthan4immediate)
{
Console.WriteLine("{0} < 4", n);
}
}
Figure 1
As you should notice, new changes are reflected on differed queries whereas new changes are not reflected on immediate execution because the results are cached in a strongly typed container. It may appear in the example that the query is happening when lessthan4
is initialized. However, the query does not run until you iterate over the collection. One of the downsides I have observed with differed execution is that bugs are harder to trace because enumerating the query could happen much further down in your code. This is where the exception would be thrown. However, you may forget that it is the original query that is the problem. So if differed execution is not the desired behavior you want, use the non-deferred extension methods like ToArray
, ToList
, ToDictionary
or ToLookup
to force the query to be executed. Let's see an example for that:
public static void DelayedExceptions()
{
string[] blob = {"Resharper", "is", "great"};
IEnumerable<string> sentence = blob.Select(s => s.Substring(0, 5));
foreach (string word in sentence)
{
Console.WriteLine(word);
}
}
You will notice that when you loop the second time is when you get an exception of rangeoutofboundexception
instead of getting an exception when you declared the query. Here are the list of operators in order, showing which are differed and which are not.
Operator | Purpose | Deferred |
Distinct | Set | Yes |
ElementAt | Element | |
ElementAtOrDefault | Element | Yes |
Empty | Generation | Yes |
Except | Set | Yes |
First | Element | |
FirstORDefault | Element | |
GroupBy | Grouping | Yes |
GroupJOin | Join | Yes |
Intersect | Set | Yes |
Join | Join | Yes |
Last | Element | |
LastOrDefault | Element | |
LongCount | Aggregate | |
Max | Aggregate | |
Min | Aggregate | |
OfType | Conversion | Yes |
OrderBy | Ordering | Yes |
OrderByDescending | Ordering | Yes |
Range | Generation | Yes |
Reverse | Ordering | Yes |
Repeat | Generation | Yes |
Select | Projection | Yes |
SelectMany | Projection | Yes |
SequenceEqual | Equality | |
Single | Element | |
SingleOrDefault | Element | |
Skip | Partitioning | Yes |
SkipWhile | Partitioning | Yes |
Sum | Aggregate | |
Take | Partitioning | Yes |
TakeWhile | Partitioning | Yes |
ThenBy | Ordering | Yes |
ThenByDescending | Ordering | Yes |
ToDictionary | Conversion | |
ToArray | Conversion | |
ToList | Conversion | |
ToLookup | Conversion | |
Union | Set | Yes |
Where | Restriction | Yes |
Support for Non-generic Collections.
Most of the extension methods work with IEnumerable<T>
. However, there are legacy collections in .NET such as Collection
and ArrayList
that do not implement IEnumerable<T>
. Thankfully, by using the generic Enumerable.OfType<T>()
or Cast operator, it is possible to iterate over non-generic collections. OfType<T>()
is one of the few extension methods that does not extend generic types and allows you to specify the type. One common use of OfType
is to specify the right type when building LINQ queries. This is so that you can get design-time support for building queries and the compiler can check for errors. Another great use of OfType
would be to filter the items from an array, excluding items that are not of the right type, such as ArrayList
. This would allow you to add Book
s as well as a Car
. In order to pull only Book
s and no Car
, you can use OfType
to filter the collection to the right type.
public class Book
{
public string Title { get; set; }
public string Author { get; set; }
}
public class Car
{
public string Make { get; set; }
public string Model { get; set; }
}
public static void OfTypeExtesionMethod()
{
ArrayList books = new ArrayList()
{
new Book{Title="ASP.NET 2.0 Website Programming: Problem -
Design - Solution",
Author="Marco Bellinaso"},
new Book{Title="Pro ASP.NET 3.5 in C# 2008, Second Edition",
Author="MacDonald and Mario Szpuszta"},
new Book{Title="ASP.NET 2.0 Unleashed", Author="Stephen Walther"},
new Book{Title="ASP.NET 3.5 Unleashed ", Author="Stephen Walther"},
new Car{Make="Toyota", Model="Corolla"}
};
IEnumerable<Book> bystephen = from book in books.OfType<Book>()
where book.Author == "Stephen Walther"
select book;
foreach (Book book in bystephen)
{
Console.WriteLine("Book {0} by {1}", book.Title, book.Author);
}
}
Figure 2
public static void CastOperator()
{
ArrayList list = new ArrayList();
list.Add(1);
list.Add(2);
list.Add("ASP.NET 3.5 Unleashed");
IEnumerable<int> numbers = list.Cast<int>();
foreach (int number in numbers)
{
Console.WriteLine(number);
}
}
When you run the cast operator, you get an exception because the cast operator tries to cast every element in the collection to the specified type. If it does not cast, it will throw an exception. In contrast, the OfType
operator will return objects that can be cast to the right type and skips the rest from the collection.
Query Syntax
The .NET Framework version 3.5 allows querying data using LINQ in a variety of different ways:
- Extension methods and Lambda expressions
- Query syntax
- Generic delegate
- Enumerable type
- Expression tree
- Anonymous methods
public static void QuerySyntax()
{
int[] numbers = { 1, 2, 3, 4, 5, 6 };
var usingquerysntax = from n in numbers
where n < 5
select n;
var usingextensionmethods = numbers.Where(n => n < 5).Select(n => n);
Enumerable.Where(numbers, n => n < 5).Select(n => n);
Func<int, bool> where = n => n < 5;
Func<int, int> select = n => n;
var usinggenericdelegate = numbers.Where(where).Select(select);
Expression<Func<int, bool>> where1 = n => n < 5;
Expression<Func<int, int>> select1 = n => n;
var usingexpression = numbers.Where(where).Select(select);
var usinganonymousmethods = numbers.Where(delegate(int n) { return n < 5; })
.Select(delegate(int n) { return n; });
}
The above code would give the same result. Key things to consider are that query syntax is simply a nicer way to execute extension methods. Wherever a delegate is required, lambdas, generic delegates or an anonymous method can be passed.
Expression Tree
An expression tree is an efficient data representation of a query's lambda expression. In terms of LINQ to SQL, the expression tree gets converted to SQL to ensure that filtering and ordering is performed on the server instead of bringing the data down and applying filtering and ordering on .NET objects. For lambda expressions, the compiler can generate IL code or an expression tree. If the operator accepts a method delegate, IL code is emitted. If an operator accepts an expression of a method delegate, the expression tree is returned. Let's look at the two different implementations of Select
: one defined in System.Linq.Enumerable
and the other defined in System.Linq.Queryable
.
public static IEnumerable<TResult> Select<TSource,
TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector);
public static IQueryable<TResult> Select<TSource,
TResult>(this IQueryable<TSource> source,
Expression<Func<TSource, TResult>> selector);
As you can see, LINQ to objects generate IL code because it takes in a generic delegate, whereas a LINQ to SQL implementation takes an expression tree that gets converted to SQL to be executed by SQL server.
Partial Methods
By defining a method partial
, it allows you to prototype a method in one file and implement it in another file. They are equivalent to lightweight events in C#. Basically you can define a prototype in one partial
class and define actual implementation in another partial
class. If there is no implementation found, no IL code is emitted in the compiled assembly. Following are the important facts of partial
methods:
Partial
methods can only be defined within a partial
class. Partial
methods must return a void. Partial
methods can be static
or instance
. Partial
methods can have parameters except out
modifier. Partial
methods are always private
.
public partial class PartialTest
{
partial void Partial(int total);
public void TestPartial()
{
int total = 0;
Console.WriteLine("Total before calling Partial method {0}", total);
Partial(++total);
Console.WriteLine("Total after calling Partial method {0}", total);
}
}
PartialTest test = new PartialTest();
test.TestPartial();
Figure 3
As you can see, there was no implementation defined for partial
methods. Therefore the value assigned to the total equalled 0
, meaning the Partial
method never got called.
Returning Projections From Methods
You may have noticed that the above code made use of an implicit type variable by using var
, which infers the type. However, implicit type cannot be used to define parameters or return the values or the field of a class type. So, one way to return data would be to stop using implicit types and return a strongly typed object such as IEnumerable<Book>
or IEnumerable<string>
. However, there are cases when you want to return projections and, in those cases, you do not know the actual data type and the use of var
keyword is mandatory. In those cases, you can return a standard CLR array using the ToArray<T>()
extension method.
public static Array ReturnProjections()
{
Book[] books =
{
new Book{Title="ASP.NET 2.0 Unleashed", Author="Stephen Walther"},
new Book{Title="ASP.NET 3.5 Unleashed ", Author="Stephen Walther"}
};
var projections = from book in books
select new { book.Title };
return projections.ToArray();
}
Notice that we are not specifying the underlying type in the return type for ToArray
because we do not know the underlying type.
Array books = ReturnProjections();
foreach (object o in books)
{
Console.WriteLine(o);
}
LINQ Over Dataset
Out-of-the-box, datasets, datatables and dataviews do not have the necessary infrastructure to work in a LINQ query. This is where Microsoft introduced the new extension called System.Data.DataSetExtensions.dll, which includes DataTable
extensions, DataRow
extensions and TypedTableBasedExtesions
. To make a data table LINQ aware, you simply need to call the AsEnumerable
extension, which will return an EnumerableRowCollection
consisting of DataRows
. Another very useful extension method is DataRowExtensions.Field<T>()
, which allows you to do safe casting in a LINQ query and helps prevent runtime exceptions. Once you apply the filter on a datatable using the LINQ query, the result is simply EnumerableRowCollection
. In order to return back a table, we can use the new copytodatatable
extension method on the result.
public static void DataSetExample(DataTable books)
{
var implicittype = from book in books.AsEnumerable()
where book.Field<string>("Author") == "Stephen Walther"
select book;
var datatable = implicittype.CopyToDataTable();
}
Anonymous Types
The C# 3.0 language includes the ability to dynamically create new unnamed classes. This type of class is known as an anonymous type. An anonymous type has no name and is generated by the compiler based on the initialization of the object being instantiated. Since the class has no type name, any variable assigned to an object of an anonymous type must have some way to declare it. This is the purpose of the new C# 3.0 var
keyword. Anonymous types allow you to define class on-the-fly using object initialization syntax and assigning to var
keyword. At compile time, the C# compiler will generate a unique class which is not accessible from your code. All anonymous classes derive from System.Object
and therefore inherit all the properties and methods of the object
class. Anonymous classes are provided with overridden versions of Equals()
, GetHashCode
and ToString()
. The ToString
implementation for an anonymous book class would look something like this:
public override string ToString()
{
StringBuilder builder = new StringBuilder();
builder.Append("{ BookTitle = ");
builder.Append(this.<BookTitle>i__Field);
builder.Append(", BookAuthor = ");
builder.Append(this.<BookAuthor>i__Field);
builder.Append(" }");
return builder.ToString();
}
GetHashCode()
computes the hash by passing in the member variable of each anonymous type to EqualityComparer<T>
.Therefore two anonymous types will return the same hashcode if they have the same set of properties and are assigned the same values. The Equals
and ==
implementations for anonymous types are done differently. When you compare two anonymous classes using Equals
, it makes use of a value-based comparison by testing the values for each property of the object with other anonymous types. However, with ==
, it checks the reference to see if the two references are equal. Let's see an example of that:
public static void AnonymousTypeSemantics()
{
var book1 = new { BookTitle = "ASP.NET 3.5 Unleashed",
BookAuthor = "Stephen Walther" };
var book2 = new { BookTitle = "ASP.NET 3.5 Unleashed",
BookAuthor = "Stephen Walther" };
Console.WriteLine("Checking if book1 == book2");
if (book1 == book2)
{
Console.WriteLine("Same object");
}
else
{
Console.WriteLine("Not same object");
}
Console.WriteLine("Checking if book1.Equals(book2)");
if (book1.Equals(book2))
{
Console.WriteLine("Same object");
}
else
{
Console.WriteLine("Not same object");
}
}
The result is:
Figure 4
Standard Query Operators
The list of standard query operators is huge. I will cover a few of them that I found interesting and that require a bit of understanding.
Select Many and Concat Operators
The SelectMany
operator is used to create one to many projections. The SelectMany
operator will return zero or more inputs for every input element. The prototype is as follows:
public static IEnumerable<S> SelectMany<T, S>(
this IEnumerable<T> source,
Func<T, IEnumerable<S>> selector);
The operator takes an input sequence of T
and a selector delegate that takes an element of type T
. It returns an IEnumerable
of S
, an intermediate output sequence. Let's see an example:
public static void SelectManyOperator()
{
string[] items = { "this", "is", "a", "test" };
IEnumerable<char> characters = items.SelectMany(s => s.ToCharArray());
foreach (char character in characters)
{
Console.WriteLine(character);
}
}
Figure 5
As you can see from the result, for every single input of word, the select many operator yields an array of characters. The select many operator concatenates each of those arrays of characters into a single sequence. The concat
operator allows multiple sequences to be merged into a single sequence. I have found that both the concat
and selectmany
operators can be used interchangeably. In fact, the main difference I have found is that the selectmany
operator is more flexible and allows you to concat
more than two sequences (an array of sequences), as compared to concat
, which only works on two sequences. Here is an example illustrating the merging of two sequences together, using both concat
and selectmany
:
public static void ConcatAndSelectManyOperator()
{
string[] books = {
"ASP.NET 2.0 Website Programming: Problem -
Design - Solution",
"Pro ASP.NET 3.5 in C# 2008, Second Edition",
"ASP.NET 2.0 Unleashed",
"ASP.NET 3.5 Unleashed "
};
IEnumerable<string> concatbooks = books.Take(2).Concat(books.Skip(2));
IEnumerable<string> selectmanybooks = new[]{
books.Take(2),
books.Skip(2)
}.SelectMany(b => b);
}
Using concat
or selectmany
would yield the same results, but the advantage of selectmany
is that it allows you to merge more than two sequences to return a single sequence back.
OrderBy Operators
The order
operator allows collections to be ordered using orderby
, orderbydescending
, thenby
and thenbydescending
. Here is what the prototype looks like:
public static IOrderedEnumerable<T> OrderBy<T, K>(
this IEnumerable<T> source,
Func<T, K> keySelector)
where
K : IComparable<K>
The operator takes an IEnumerable<T>
and orders the collection based on the key selector that will return the key value for each element in the sequence. The only important point that is worth mentioning is that the sorting of orderby
and orderbydescending
is considered unstable, meaning that if two elements return the same key based on the key selector, the order of the output could be the maintained or could be different. You should never rely on the order of elements coming out of an orderby
call other than the field specified in the orderby
clause.
Another interesting point that I found was that orderby
, orderbydescending
and orderbyascending
take as an input source IEnumerable<T>
, as you can see from the prototype. However, the return
type is IORderedEnumerable<T>
. The problem with that is if you want to order by more than 1 column, you cannot pass IOrderedEnumerable<T>
since orderby
takes IEnumerable<T>
. The solution to this is to use the ThenBy
or ThenByDescending
operators, which take IOrderedEnumerble<T>
. Let's proceed to an example:
public static void OrderByOperators()
{
Book[] books =
{
new Book{Title="ASP.NET 2.0 Website Programming: Problem -
Design - Solution",
Author="Marco Bellinaso"},
new Book{Title="ASP.NET 2.0 Unleashed", Author="Stephen Walther"},
new Book{Title="Pro ASP.NET 3.5 in C# 2008, Second Edition",
Author="MacDonald and Mario Szpuszta"},
new Book{Title="ASP.NET 3.5 Unleashed ", Author="Stephen Walther"},
};
IEnumerable<Book> orderedbooks =
books.OrderBy(b => b.Author).ThenBy(b => b.Title);
}
In the above example, I make use of both orderby
and thenby
. Since orderby
only works on IEnumerable<T>
, I make use of the thenby
extension method as well. The second prototype for orderby
takes in Icomparable
and does not require the return
type of the keyselector
delegate to return a Collection
that implements IComparable
. Here is what the prototype looks like:
public static IOrderedEnumerable<T> OrderBy<T, K>(
this IEnumerable<T> source,
Func<T, K> keySelector,
IComparer<K> comparer);
interface IComparer<T> {
int Compare(T x, T y);
}
The compare
method will return a greater int
if the second argument is greater than the first, equal to zero if both arguments are equal and less than zero if the first argument is less than the second. Let's see an example of that:
public class LengthComparer : IComparer<string>
{
public int Compare(string title1, string title2)
{
if (title1.Length < title2.Length) return -1;
else if (title1.Length > title2.Length) return 1;
else return 0;
}
}
I have created a custom comparer which compares the book
title based on the title
's length. When the length of the first title
is less than the second title
's length, I return -1. If it's greater, I return 1. Otherwise, I return 0 for being equal. This is how I make use of the comparer in my LINQ query.
public static void OrderByUsingCustomComparer()
{
Book[] books =
{
new Book{Title="ASP.NET 2.0 Website Programming: Problem -
Design - Solution",
Author="Marco Bellinaso"},
new Book{Title="ASP.NET 2.0 Unleashed", Author="Stephen Walther"},
new Book{Title="Pro ASP.NET 3.5 in C# 2008, Second Edition",
Author="MacDonald and Mario Szpuszta"},
new Book{Title="ASP.NET 3.5 Unleashed ", Author="Stephen Walther"},
};
IEnumerable<Book> orderedbooks = books.OrderBy(b => b.Title,
new LengthComparer());
}
AsEnumerable Operator
I found the AsEnumerable
operator to be really important in understanding where the query gets executed, meaning is it going to get converted to SQL and the query would be performed on SQL server or LINQ to objects would be used and query would be performed in memory. The ideal use I have found for AsEnumerable
would be when I know that a certain functionality is not available in SQL server, I can perform part of the query using LINQ to SQL (Iqueryable
) and the rest executed as LINQ to objects (IEnumerable<T>
). Basically, AsEnumerable
is a hint to perform this part of the execution using LINQ to objects. This is how the prototype looks:
public static IEnumerable<T> AsEnumerable<T>(
this IEnumerable<T> source);
The prototype operates on the source of IEnumerable<T>
and also returns an IEnumerable<T>
. This is because standard query operators operate on IEnumerable<T>
, whereas LINQ to SQL operates on IQueryable<T>
, which also happens to implement IEnumerable<T>
. So when you execute an operator like on an IQueryable <T>
(domain objects), it uses a LINQ to SQL implementation for the where
clause. As a result, the query gets executed on SQL Server. But what if we knew in advance that a certain operator would fail on SQL Server since SQL Server has no implementation for it. It's good to use the AsEnumerable
operator to tell the query engine to perform this part of the query in memory using LINQ to objects. Let's see an example:
public static void AsEnumerableExample()
{
NorthwindDataContext db = new NorthwindDataContext();
var firstproduct = (from product in db.Products
where product.Category.CategoryName == "Beverages"
select product
).ElementAt(0);
Console.WriteLine(firstproduct.ProductName);
}
When you run this query, it would throw an exception saying that elementat
is not supported because SQL Server does not know how to execute elementAt
. In this case, when I add as enumerable, the query would execute fine as follows:
public static void AsEnumerableExample()
{
NorthwindDataContext db = new NorthwindDataContext();
var firstproduct = (from product in db.Products
where product.Category.CategoryName == "Beverages"
select product
).AsEnumerable().ElementAt(0);
Console.WriteLine(firstproduct.ProductName);
}
DefaultIfEmpty
The DefaultIfEmpty
operator returns a default element if the input sequence is empty. If the input sequence is empty, the DefaultIfEmpty
operator returns a sequence with a single element of default (T) which, for reference types, is null
. Furthermore, the operator also allows you to specify the default operator that will be returned.
public static void DefaultIfEmptyExample()
{
string[] fruits = { "Apple", "pear", "grapes", "orange" };
string banana = fruits.Where(f => f.Equals("Banana")).First();
Console.WriteLine(banana);
}
The above example throws an exception because the first operator requires that sequence not be empty. Therefore if we were to use defaultifEmpty
, this is how it would look:
public static void DefaultIfEmptyExample1()
{
string[] fruits = { "Apple", "pear", "grapes", "orange" };
string banana =
fruits.Where(f => f.Equals("Banana")).DefaultIfEmpty("Not Found").First();
Console.WriteLine(banana);
}
Another interesting use of DefaultIfEmpty
is to perform a left outer join
using GroupJoin
. Here is an example that illustrates that:
public class Category
{
public string CategoryName { get; set; }
}
public class Product
{
public string ProductName { get; set; }
public string CategoryName { get; set; }
}
public static void LeftOuterJoin()
{
Category[] categories = {
new Category{CategoryName="Beverages"},
new Category{CategoryName="Condiments"},
new Category{CategoryName="Dairy Products"},
new Category{CategoryName="Grains/Cereals"}
};
Product[] products = {
new Product{ProductName="Chai",
CategoryName="Beverages"},
new Product{ProductName="Northwoods Cranberry Sauce",
CategoryName="Condiments"},
new Product{ProductName="Butter",
CategoryName="Dairy Products"},
};
var prodcategory =
categories.GroupJoin(
products,
c => c.CategoryName,
p => p.CategoryName,
(category, prodcat) => prodcat.DefaultIfEmpty()
.Select(pc => new { category.CategoryName,
ProductName = pc != null ? pc.ProductName : "No" })
).SelectMany(s => s);
foreach (var product in prodcategory)
{
Console.WriteLine("Category :{0}, Product = {1}", product.CategoryName,
product.ProductName);
}
}
In the example above, I am using left outer join
to list all categories, regardless of whether they have any products or not.
History
- 30th January 2008 -- Original version posted
- 5th March 2008 -- Article updated