Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#

The Deferred Execution and Its Problem

4.16/5 (4 votes)
5 Feb 2018CPOL2 min read 12.6K   64  
A discussion about the problem about deferred execution and how we solve it.

Introduction

In this article, we are going to see what kind of problem can an instance of IEnumerable<T> cause if it is deferredly executed.

The problem

The extension methods provided by System.Linq.Enumerable class are great and elegant. The class along with System.Linq.Queryable are the essentials of LINQ. They enable our code to access the collection from a higher aspect of abstraction, preserving more flexibility. However, using them incorrectly could result in a disaster of performance. Following code snippet is a perfect example. Can you guess how many times the foreach loop inside the method ListAllFileInfos() will be executed?

C#
// Namespaces omitted.

class Program
{
    static void Main(string[] args)
    {
        var files = ListAllFileInfos();

        if (!files.Any())
        {
            Console.WriteLine("No file is found in current directory.");
        }
        else
        {
            Console.WriteLine($"{files.Count()} files are found in current directory:");
            Console.WriteLine("-------");

            foreach(var file in files)
            {
                Console.WriteLine(file.Name);
                Console.WriteLine($"{file.Length} bytes");
                Console.WriteLine($"Modified at {file.LastWriteTime}");
                Console.WriteLine();
            }
        }
    }

    static IEnumerable<FileInfo> ListAllFileInfos()
    {
        foreach(var filename in Directory.EnumerateFiles(Environment.CurrentDirectory))
        {
            yield return new FileInfo(filename);
        }
    }
}

The answer is 3, which are invoked by Any(), Count() and foreach respectively. This is because method ListAllFileInfos() produces the IEnumerable<T> instance in Deferred Execution. Result from the instance will be produced every time only if GetEnumerator() method is invoked, which is exactly what  Any(), Count() and foreach do behind the scenes. Code like this will cause unnecessary IO access as result from ListAllFileInfos() is supposed to be produced once only. 

Let us see how we solve the problem.

The extension methods

The first thing we need to do is to determine whether an IEnumerable<T> instance is produced in deferred execution. For that, we can simply test whether the instance implements the interface that has Count property. This is because a collection produced in deferred execution will not be able to know total number of items before all items are enumerated.

There are three basic collection interfaces that provide Count property, which are:

  • System.Collection.Generic.ICollection<T>
  • System.Collection.Generic.IReadOnlyCollection<T>
  • System.Collection.ICollection

All other collection interfaces that have Count property (for example, IReadOnlyList<T>) are derived from the three. So these interfaces are sufficient for us. Now we can have following extension method:

C#
    public static partial class DeferredEnumerable
    {
        public static bool Deferred<T>(this IEnumerable<T> source)
        {   
            return !(source is ICollection<T> 
                  || source is IReadOnlyCollection<T> 
                  || source is ICollection);
        }
    }

Next is to see whether we need ToArray() or ToList() based on result above. Both methods will buffer items from deferred execution to an array or List<T> instance. 

C#
    public static partial class DeferredEnumerable
    {
        public static IEnumerable<T> ExecuteIfDeferred<T>(this IEnumerable<T> source)
        {
            if (source is null) throw new ArgumentNullException(nameof(source));

            return source.Deferred() ? source.ToList() : source;
            // You may replace ToList() with ToArray().
        }
    }

The ToList() will be called only if we need to. If source is not produced in deferred execution already, we can just do nothing and return it.

Now we can change original example to following snippet:

C#
// Namespaces omitted.

class Program
{
    static void Main(string[] args)
    {
        var files = ListAllFileInfos().ExecuteIfDeferred(); // Here is the change!

        if (!files.Any())
        {
            Console.WriteLine("No file is found in current directory.");
        }
        else
        {
            Console.WriteLine($"{files.Count()} files are found in current directory:");
            Console.WriteLine("-------");

            foreach(var file in files)
            {
                Console.WriteLine(file.Name);
                Console.WriteLine($"{file.Length} bytes");
                Console.WriteLine($"Modified at {file.LastWriteTime}");
                Console.WriteLine();
            }
        }
    }

    static IEnumerable<FileInfo> ListAllFileInfos()
    {
        foreach(var filename in Directory.EnumerateFiles(Environment.CurrentDirectory))
        {
            yield return new FileInfo(filename);
        }
    }
}

After we add ExecuteIfDeferred() calling, the foreach loop inside ListAllFileInfos() now will only be run once.

Other benefits the change can bring are:

  • Decoupling: no matter how ListAllFileInfos() is implemented (maybe it is implemented by third party libraries or your co-workers from other departments), our code can work nomally.
  • Minimal change: This is great for refactoring, especially for lengthy, legacy code while the code still preserves flexibility.

Points of Interest

The given example is about files. However, if you are working with database access (such as Entity Framework), you may encounter same issue which also causes redundant queries. You can apply same solution to your code. 

History

  • 2018-02-04 Initial post
  • 2018-02-05 Add example download link
  • 2018-02-06 Correct example code

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)