Introduction
In this article, we are going to see what kind of problem can an instance of IEnumerable<T>
cause if it is deferredly executed.
The problem
The extension methods provided by System.Linq.Enumerable
class are great and elegant. The class along with System.Linq.Queryable
are the essentials of LINQ. They enable our code to access the collection from a higher aspect of abstraction, preserving more flexibility. However, using them incorrectly could result in a disaster of performance. Following code snippet is a perfect example. Can you guess how many times the foreach
loop inside the method ListAllFileInfos()
will be executed?
class Program
{
static void Main(string[] args)
{
var files = ListAllFileInfos();
if (!files.Any())
{
Console.WriteLine("No file is found in current directory.");
}
else
{
Console.WriteLine($"{files.Count()} files are found in current directory:");
Console.WriteLine("-------");
foreach(var file in files)
{
Console.WriteLine(file.Name);
Console.WriteLine($"{file.Length} bytes");
Console.WriteLine($"Modified at {file.LastWriteTime}");
Console.WriteLine();
}
}
}
static IEnumerable<FileInfo> ListAllFileInfos()
{
foreach(var filename in Directory.EnumerateFiles(Environment.CurrentDirectory))
{
yield return new FileInfo(filename);
}
}
}
The answer is 3, which are invoked by Any()
, Count()
and foreach
respectively. This is because method ListAllFileInfos()
produces the IEnumerable<T>
instance in Deferred Execution. Result from the instance will be produced every time only if GetEnumerator()
method is invoked, which is exactly what Any()
, Count()
and foreach
do behind the scenes. Code like this will cause unnecessary IO access as result from ListAllFileInfos()
is supposed to be produced once only.
Let us see how we solve the problem.
The extension methods
The first thing we need to do is to determine whether an IEnumerable<T>
instance is produced in deferred execution. For that, we can simply test whether the instance implements the interface that has Count
property. This is because a collection produced in deferred execution will not be able to know total number of items before all items are enumerated.
There are three basic collection interfaces that provide Count
property, which are:
System.Collection.Generic.ICollection<T>
System.Collection.Generic.IReadOnlyCollection<T>
System.Collection.ICollection
All other collection interfaces that have Count
property (for example, IReadOnlyList<T>
) are derived from the three. So these interfaces are sufficient for us. Now we can have following extension method:
public static partial class DeferredEnumerable
{
public static bool Deferred<T>(this IEnumerable<T> source)
{
return !(source is ICollection<T>
|| source is IReadOnlyCollection<T>
|| source is ICollection);
}
}
Next is to see whether we need ToArray()
or ToList()
based on result above. Both methods will buffer items from deferred execution to an array or List<T>
instance.
public static partial class DeferredEnumerable
{
public static IEnumerable<T> ExecuteIfDeferred<T>(this IEnumerable<T> source)
{
if (source is null) throw new ArgumentNullException(nameof(source));
return source.Deferred() ? source.ToList() : source;
}
}
The ToList()
will be called only if we need to. If source
is not produced in deferred execution already, we can just do nothing and return it.
Now we can change original example to following snippet:
class Program
{
static void Main(string[] args)
{
var files = ListAllFileInfos().ExecuteIfDeferred();
if (!files.Any())
{
Console.WriteLine("No file is found in current directory.");
}
else
{
Console.WriteLine($"{files.Count()} files are found in current directory:");
Console.WriteLine("-------");
foreach(var file in files)
{
Console.WriteLine(file.Name);
Console.WriteLine($"{file.Length} bytes");
Console.WriteLine($"Modified at {file.LastWriteTime}");
Console.WriteLine();
}
}
}
static IEnumerable<FileInfo> ListAllFileInfos()
{
foreach(var filename in Directory.EnumerateFiles(Environment.CurrentDirectory))
{
yield return new FileInfo(filename);
}
}
}
After we add ExecuteIfDeferred()
calling, the foreach loop inside ListAllFileInfos()
now will only be run once.
Other benefits the change can bring are:
- Decoupling: no matter how
ListAllFileInfos()
is implemented (maybe it is implemented by third party libraries or your co-workers from other departments), our code can work nomally. - Minimal change: This is great for refactoring, especially for lengthy, legacy code while the code still preserves flexibility.
Points of Interest
The given example is about files. However, if you are working with database access (such as Entity Framework), you may encounter same issue which also causes redundant queries. You can apply same solution to your code.
History
- 2018-02-04 Initial post
- 2018-02-05 Add example download link
- 2018-02-06 Correct example code