Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

A Link to LINQ and a Look at the Binary Search Tree

0.00/5 (No votes)
19 Apr 2010 1  
A Comprehensive Look at LINQ and the Binary Search Tree Data Structure

Abstract

In terms of querying data, this article will introduce LINQ, a powerful feature for querying data. It will show how to filter an array or collection using LINQ's where clause, how to sort the query results using the orderby clause, and how to use the select clause to select specific properties of an object. Language Integrated Queries (LINQ) allow you to write query expressions that retrieve information from certain data sources. This article will focus on the use of LINQ to objects to query arrays and Lists, selecting elements that satisfy a set of conditions -- this is known as filtering. Note that in LINQ, the two main units of data are the element and the sequence. The latter section of this article will change the direction and introduce the binary search tree data structure. While that type of data structure has nothing to do with querying data in a database that stores data on hard disk, it exemplifies a data traversal process. If you are an advanced developer, that will not make much sense to mix these two topics -- it is merely an attempt to encourage the beginner to study data access. So having said that, a LINQ query begins with a from clause, which specifies a range variable (value) and the data source to query (values). The range variable represents each item in the data source (similar to a control variable in a foreach statement). Query expressions end with either a select or a group clause, in which you can think of it as traversing the input sequence:

using System;
using System.Collections.Generic;
using System.Linq;
class LinqDemo
{
static void Main()
{
   string[] names = { "Tom", "Dick", "Harry", "Mary", "Jay" };
   IEnumerable query =
   from n in names
   where n.Contains ("a") // Filter elements
   orderby n.Length // Sort elements
   select n.ToUpper(); // Translate each element (project)
     foreach (string name in query) 
     Console.WriteLine (name);
 }
}

OUTPUT

JAY
MARY
HARRY

The code example below demonstrates querying an array of integers using LINQ. Repetition statements that filter arrays focus on the process of getting the results -- iterating through the elements and checking whether they satisfy the desired criteria:

using System;
using System.Collections.Generic;
using System.Linq;
class LinqDemo
 {
  public static void Main(string[]  args)
   {
    
     int[]  values = { 2, 9, 5, 0, 3, 7, 1, 4, 8, 5 };
     Display(values, "Original array:");

     var filtered = 
      from value in values
      where value > 4
      select value;
     
     Display(filtered, "Array values greater than 4:");

      var sorted = 
       from value in values
       orderby value
       select value;

     Display(sorted, "Original array, sorted:");

      var sortFilteredResults =
        from value in filtered
        orderby value descending
        select value;

      Display( sortFilteredResults, "Values greater than 4, 
			descending order (separately):");

       var sortAndFilter =
          from value in values
          where value > 4
          orderby value descending
          select value;

      Display( sortAndFilter, "Values greater than 4, descending order (one query):");
      }
      public static void Display(
       IEnumerable results, string header)
       {
         Console.Write(  "{0}", header );
         foreach (var element in results)
         Console.Write(  "{0}", element);
         Console.WriteLine();
       }
  }

OUTPUT

Original array:2950371485
Array values greater than 4:95785
Original array, sorted:0123455789
Values greater than 4, descending order (separately):98755
Values greater than 4, descending order (one query):98755

Arrays and collections already implement the IEnumerable<t> interface--they define the methods its describes. You can call any method defined by IEnumerable<t> on an array or collection to iterate through its elements. This should make sense if you are familiar with generic collections and iterators: the foreach statement uses IEnumerable<t> methods to iterate over each element of a collection. A LINQ query returns an object that has types such as integers. It can also be used with most data types, including strings and user-defined classes.

Using LINQ to Query an Array of Employee Objects

To use LINQ to query an array of Employee objects, we first have to write a user-defined class. Here is an Employee class with FirstName, LastName, and MonthlySalary properties:

using System;
public class Employee
 {
   private decimal monthlySalaryValue;

   public string FirstName { get;   set;  }
   public string LastName  { get;   set;  }

   public Employee(string first, string last, decimal salary)
     {
        FirstName = first;
        LastName = last;
        MonthlySalary = salary;
     }

   public decimal MonthlySalary
     {
        get
         {
            return monthlySalaryValue;
         }
        set
         {
             if ( value >= 0M )
              {
                monthlySalaryValue = value;
              }
          }
      }

    public override string ToString()
     {
       return string.Format("{0, -10} {1, -10} {2, 10:C}",
       FirstName, LastName, MonthlySalary);
     }
  }

We compile this as a class library:

csc.exe /target:library Employee.cs. 
using System;
using System.Collections.Generic;
using System.Linq;
class LinqDemo
 {
  public static void Main(string[]  args)
   {
     Employee[]  employees = {
       new Employee("Jason", "Red", 5000M),
       new Employee("Ashley", "Green", 7600M),
       new Employee("Mathew", "Indigo", 3587.5M),
       new Employee("James", "Indigo", 4700.77M),
       new Employee("Luke", "Indigo", 6200M),
       new Employee("Jason", "Blue", 3200M),
       new Employee( "Wendy", "Brown", 4236.4M) };

       Display( employees, "Original array" );

       var between4K6K =
        from e in employees
        where e.MonthlySalary >= 4000M && e.MonthlySalary <= 6000M
        select e;

        Display( between4K6K, string.Format
	("Employees earning in the range {0:C}-{1:C} per month", 4000, 6000) );


        var nameSorted = 
         from e in employees
         orderby e.LastName, e.FirstName
         select e;
       
        Console.WriteLine("First employee when sorted by name:");

         if (nameSorted.Any() )
           Console.WriteLine( nameSorted.First().ToString() + "\n" );
         else
           Console.WriteLine( "not found\n" );

         var lastNames = 
            from e in employees
            select e.LastName;

         Display(lastNames.Distinct(), "Unique employees last names");

         var names = 
         from e in employees
         select new { e.FirstName, Last = e.LastName };

         Display(names, "Names only");
       }
  
     public static void Display<t>(
     IEnumerable<t> results, string header)
      {
          Console.WriteLine( "{0}:", header );

          foreach (T element in results)
          Console.WriteLine( element );
          Console.WriteLine();
      }
  }

We externally reference the Employee class by compiling the main file as follows:

csc.exe /reference:Employee.dll Program.cs 

OUTPUT

Original array:
Jason      Red         $5,000.00
Ashley     Green       $7,600.00
Mathew     Indigo      $3,587.50
James      Indigo      $4,700.77
Luke       Indigo      $6,200.00
Jason      Blue        $3,200.00
Wendy      Brown       $4,236.40

Employees earning in the range $4,000.00-$6,000.00 per month:
Jason      Red         $5,000.00
James      Indigo      $4,700.77
Wendy      Brown       $4,236.40

First employee when sorted by name:
Jason      Blue        $3,200.00

Unique employees last names:
Red
Green
Indigo
Blue
Brown

Names only:
{ FirstName = Jason, Last = Red }
{ FirstName = Ashley, Last = Green }
{ FirstName = Mathew, Last = Indigo }
{ FirstName = James, Last = Indigo }
{ FirstName = Luke, Last = Indigo }
{ FirstName = Jason, Last = Blue }
{ FirstName = Wendy, Last = Brown }

We see an orderby clause to sort the results according to several properties--specified in a comma-separated list. In this query, the employees are separated alphabetically by last name. The query result's Any method returns a true if there is at least one element; it returns a false if there are no elements. The query results First method returns the first element in the result. Note that we have not specified the class that defines methods First and Any. These methods are not declared in the IEnumerable<t> interface. They are extension methods, but they can be used as if they were methods of IEnumerable<t>. The select clause is used to select the range variable's LastName property rather than the range variable. This causes the results of the query to consist of only the last names (as strings), instead of complete Employee objects. The LINQ query selects the properties FirstName and LastName, the syntax ** new { e.FirstName, Last = e.LastName } ** creates a new object of an anonymous type (a type with no name), which the compiler generates for you based on the properties listed in the curly braces. In our case, the anonymous type consists of properties for the first and last names of the selected Employee object. This demonstrates how you can specify a new name for the selected property.

Querying a Generic Collection using LINQ

using System;
using System.Collections.Generic;
using System.Linq;
public class Program {
public static void Main() {
     List items = new List();
     items.Add("aQua");
     items.Add("RusT");
     items.Add("yElLow");
     items.Add("rEd");

    var startsWithR =
      from item in items
     let uppercasedString = item.ToUpper()
    where uppercasedString.StartsWith("R")
    orderby uppercasedString
    select  uppercasedString;

      foreach (var item in startsWithR)
      Console.Write( "{0} ", item);
      Console.WriteLine();

       items.Add("rUbY");
       items.Add("SaFfRon");

         foreach ( var item in startsWithR)
         Console.Write(  "{0} ", item);
         Console.WriteLine();
  }
}

OUTPUT

RED RUST
RED RUBY RUST

We use LINQ's let clause to create a new range variable. We use string method ToUpper to convert each item to uppercase to then store the result in the new range variable uppercasedString. Next we use the new range variable uppercasedString in the where, orderby, and select clauses. Note that query is created only once:

var startsWithR =
      from item in items
      let uppercasedString = item.ToUpper()
      where uppercasedString.StartsWith("R")
      orderby uppercasedString
      select  uppercasedString;

Iterating the results more than once gives us two different lists of colors. This demonstrates LINQ's deferred execution--the query executes only when you access the results--such as iterating over them or using the Count method--not when you define the query. This allows you to create a query once and execute it many times. Any changes to the data source are reflected in the results each time the query executes.

The Binary Search Data Structure: Not Really a Link to LINQ, but Powerful Just the Same

Data structures such as linked lists, stacks and queues are linear data structures. They store contiguously in memory. A tree is a non-linear, two-dimensional data structure with special properties. Binary trees, however, are not stored contiguously in memory (think of memory as an array of bytes). The BinaryTree class instance has a reference to the root Node class instance. The root Node class instance has references to its left and right child Node instances; these child instances have references to their child instances, and so on. The point is, the various Node instances that makeup a binary tree can be scattered throughout the CLR managed heap. They are not necessarily contiguous, as are the elements of an array. With a binary tree, each tree node has two links. The root node is the first node in a tree. Each link in the root node refers to a child. The left child is the first node in the left subtree and the rightchild is the first node in the right subtree. Assume we wanted to access a particular node in a binary tree. To accomplish this task, we need to search the binary tree's set of nodes, looking for the particular node. There's no direct access to a given node as with an array. Searching a binary tree takes linear time, as potentially all nodes need to be examined. That is, as the number of nodes in the binary tree increases, the number of steps to find an arbitrary node increases as well.

binary.gif

To illustrate this data structure, we will write an application that creates a binary search tree of integers and traverses it (or walks through all of its nodes) in three ways--using recursive inorder, predorder, and postorder traversals. Arguably, the simplest approach is to write a class library that defines those three algorithms and then write a test application to examine the results. The traversal here is the reusable library that our application will test:

using System;

 namespace BinaryTreeLibrary
  {
    class TreeNode
    {
     public TreeNode LeftNode {  get;  set; }

     public int Data {  get;  set; }
     public TreeNode RightNode  {  get;  set; }
     public TreeNode ( int nodeData )
      {
         Data = nodeData;
         LeftNode = RightNode = null;
      }

    public void Insert ( int insertValue )
     {
        if ( insertValue < Data )
         {
           if ( LeftNode == null )
           
            LeftNode = new TreeNode(insertValue);
            else
            LeftNode.Insert(insertValue);
         }

        else if (insertValue > Data )
        {

         if ( RightNode == null )
          RightNode = new TreeNode (insertValue);

       else
          RightNode.Insert(insertValue);
      }
    }
  }

 public class Tree
  {
    private TreeNode root;
    public Tree()
     {
       root = null;
     }
    public void insertNode( int insertValue)
     {
       if ( root == null )
        root = new TreeNode(insertValue);
       else
        root.Insert(insertValue);
     }

  public void PreorderTraversal()
    {
      PreorderHelper (root);
    }

private void PreorderHelper(TreeNode node )
    {
       if (node != null )
    {
    Console.Write( node.Data + " " );
    PreorderHelper(node.LeftNode);
    PreorderHelper(node.RightNode);

     }
  }

   public void InorderTraversal()
    {
       InorderHelper( root );
    }

   private void InorderHelper( TreeNode  node )
    {
      if ( node != null )
      {
       InorderHelper(node.LeftNode);
       Console.Write(node.Data + " " );
       InorderHelper(node.RightNode );
      }
   }

   public void PostorderTraversal()
     {
      PostorderHelper( root );
     }

    private void PostorderHelper(TreeNode  node)
      {
         if (node != null)
          {
            PostorderHelper(node.LeftNode);
            PostorderHelper(node.RightNode);
            Console.Write(node.Data + " " );
          }
        }
     }
  }

Algorithms are not as difficult as they are cracked up to be. They are just computational models. This file would either compile as a C# class library or by using the /target:library switch on the command line. If you compile it as a class library, you would simply add a C# console application solution and reference the DLL. Here is the application that tests this library out:

using System;
using BinaryTreeLibrary;

public class TreeTest
 {
   public static void Main(string[]  args)
    {
      Tree tree = new Tree();
      int insertValue;

      Console.WriteLine("Inserting Values:  ");
      Random random = new Random();
      for (int i  = 1; i <= 10; i++)
      {
      insertValue = random.Next(100);
      Console.Write(insertValue + " ");
      tree.insertNode(insertValue);
     }

    Console.WriteLine("\n\nPreorder traversal");
    tree.PreorderTraversal();

    Console.WriteLine("\n\nInorder traversal");
    tree.InorderTraversal();

    Console.WriteLine("\n\nPostorder traversal");
    tree.PostorderTraversal();
    Console.WriteLine();
     }
   }

Here is the output:

Inserting Values:
52 62 69 30 45 94 71 45 91 50

Preorder traversal
52 30 45 50 62 69 94 71 91

Inorder traversal
30 45 50 52 62 69 71 91 94

Postorder traversal
50 45 30 91 71 94 69 62 52

References

  • The MSDN Library
  • The Paul Deitel and Harvey Deitel series on C#

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here