While working on a project that required to read contents of a Windows directory, I used the .NET provided System.IO.Directory class' methods. However, there is a big downside to using these functions. This post describes the problem. The solution given is an alternate method using Windows API, the result of which is not only better, but appears to be even faster than the .NET's original methods.
Introduction
Recently, I was working on a project that needed to read the contents of a Windows directory, so I used the .NET provided System.IO.Directory
class' EnumerateDirectories
, EnumerateFiles
and EnumerateFileSystemEntries
methods. Unfortunately, there is a big downside to using these functions, and that is that if they run into a file system entry that has denied access to the current user, they immediately break - instead of handling such an error and continuing, they will just return whatever they have gathered up to the moment of breaking - and won't complete the job.
It is impossible to handle this from the outside of the methods because if you handle it, you will get in the returning IEnumerable
only partial results.
I have searched everywhere for a solution to this problem, but I was not able to find a workaround that doesn't use the aforementioned methods. So I decided to play around with Windows API and create alternative methods. The result was not only better (in a way that the methods do not break on "Access denied") but it appears to be even faster than the .NET's original methods.
Using the Code
The project itself is a Class Library type, it is not executable but building it will compile the methods into a DLL file, which you can reference into another project, and use it from there like this:
using System.IO;
DirectoryAlternative.EnumerateDirectories
(path, "*", SearchOption.AllDirectories).ToList<string>();
I used the same namespace as the original procedures (System.IO
), and named the class DirectoryAlternative
- so the usage would be as similar as possible to the original class.
The methods themselves are named the same way, they use the same parameters, and from the outside look absolutely the same as the original ones.
Here is an example of the usage of methods:
System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
string path = "V:\\MUSIC";
List<string> en = new List<string>();
sw.Start();
try { en = Directory.EnumerateDirectories
(path, "*", SearchOption.AllDirectories).ToList<string>(); } catch { }
sw.Stop();
Console.WriteLine("Directory.EnumerateDirectories : {0} ms / {1} entries",
sw.ElapsedMilliseconds.ToString("N0"), en.Count.ToString("N0"));
sw.Reset();
en = new List<string>();
sw.Start();
en = DirectoryAlternative.EnumerateDirectories(path, "*",
SearchOption.AllDirectories).ToList<string>();
sw.Stop();
Console.WriteLine("DirectoryAlternative.EnumerateDirectories :
{0} ms / {1} entries", sw.ElapsedMilliseconds.ToString("N0"), en.Count.ToString("N0"));
sw.Reset();
en = new List<string>();
sw.Start();
try { en = Directory.EnumerateFiles(path, "*",
SearchOption.AllDirectories).ToList<string>(); } catch { }
sw.Stop();
Console.WriteLine("Directory.EnumerateFiles : {0} ms / {1} entries",
sw.ElapsedMilliseconds.ToString("N0"), en.Count.ToString("N0"));
sw.Reset();
en = new List<string>();
sw.Start();
en = DirectoryAlternative.EnumerateFiles
(path, "*", SearchOption.AllDirectories).ToList<string>();
sw.Stop();
Console.WriteLine("DirectoryAlternative.EnumerateFiles : {0} ms / {1} entries",
sw.ElapsedMilliseconds.ToString("N0"), en.Count.ToString("N0"));
sw.Reset();
en = new List<string>();
sw.Start();
try { en = Directory.EnumerateFileSystemEntries
(path, "*", SearchOption.AllDirectories).ToList<string>(); } catch { }
sw.Stop();
Console.WriteLine("Directory.EnumerateFileSystemEntries : {0} ms / {1} entries",
sw.ElapsedMilliseconds.ToString("N0"), en.Count.ToString("N0"));
sw.Reset();
en = new List<string>();
sw.Start();
en = DirectoryAlternative.EnumerateFileSystemEntries
(path, "*", SearchOption.AllDirectories).ToList<string>();
sw.Stop();
Console.WriteLine("DirectoryAlternative.EnumerateFileSystemEntries : {0} ms / {1} entries",
sw.ElapsedMilliseconds.ToString("N0"), en.Count.ToString("N0"));
Console.ReadKey();
The above code snippet compares directly the original methods' performance and the DirectoryAlternative
methods - I used a very large directory with 70.000+ file system entries:
As you can see, the DirectoryAlternative
methods run around 50% faster.
How the Code Works
The code uses several Win API functions to move around the file system (I believe these same functions are used in the original .NET methods):
[StructLayout(LayoutKind.Sequential, CharSet = CharSet.Auto)]
struct WIN32_FIND_DATA
{
public uint dwFileAttributes;
public System.Runtime.InteropServices.ComTypes.FILETIME ftCreationTime;
public System.Runtime.InteropServices.ComTypes.FILETIME ftLastAccessTime;
public System.Runtime.InteropServices.ComTypes.FILETIME ftLastWriteTime;
public uint nFileSizeHigh;
public uint nFileSizeLow;
public uint dwReserved0;
public uint dwReserved1;
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 260)]
public string cFileName;
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 14)]
public string cAlternateFileName;
}
[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Unicode)]
private static extern bool FindClose(IntPtr hFindFile);
[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Unicode)]
private static extern IntPtr FindFirstFile
(string lpFileName, out WIN32_FIND_DATA lpFindFileData);
[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Unicode)]
private static extern bool FindNextFile
(IntPtr hFindFile, out WIN32_FIND_DATA lpFindFileData);
In short:
FindFirstFile
searches for a first file system entry that it can find using the provided pattern (lpFileName
) and returns a HANDLE (IntPtr
) to this file FindNextFile
searches for the next file system entry that matches the specified pattern - we use this method to go through all the files / directories FindClose
is used to close the HANDLE
All the file information is gathered inside the WIN32_FIND_DATA
struct
and returned as an out
type parameter.
For more information on these methods, you can look them up here.
The main method is the method Enumerate
. All other methods are wrapped around this one.
private static void Enumerate(string path, string searchPattern,
SearchOption searchOption, ref List<string> retValue, EntryType entryType)
{
WIN32_FIND_DATA findData;
if (path.Last<char>() != '\\') path += "\\";
AdjustSearchPattern(ref path, ref searchPattern);
searchPattern = searchPattern.Replace("*.*", "*");
Text.RegularExpressions.Regex rx = new Text.RegularExpressions.Regex(
"^" +
Text.RegularExpressions.Regex.Escape(path) +
Text.RegularExpressions.Regex.Escape(searchPattern)
.Replace("\\*", ".*")
.Replace("\\?", ".")
+ "$"
, Text.RegularExpressions.RegexOptions.IgnoreCase);
IntPtr hFile = FindFirstFile(path + "*", out findData);
List<string> subDirs = new List<string>();
if (hFile.ToInt32() != -1)
{
do
{
if (findData.cFileName == "." || findData.cFileName == "..") continue;
if ((findData.dwFileAttributes &
(uint)FileAttributes.Directory) == (uint)FileAttributes.Directory)
{
subDirs.Add(path + findData.cFileName);
if ((entryType == EntryType.Directories ||
entryType == EntryType.All) && rx.IsMatch(path + findData.cFileName))
retValue.Add(path + findData.cFileName);
}
else
{
if ((entryType == EntryType.Files ||
entryType == EntryType.All) && rx.IsMatch(path + findData.cFileName))
retValue.Add(path + findData.cFileName);
}
} while (FindNextFile(hFile, out findData));
if (searchOption == SearchOption.AllDirectories)
foreach (string subdir in subDirs)
Enumerate(subdir, searchPattern, searchOption, ref retValue, entryType);
}
FindClose(hFile);
}
The method takes all the parameters from the original Enumerate
methods (path
, searchPattern
, searchOption
) plus a by-reference argument retValue
, and entryType
, which is an enum
:
private enum EntryType { All = 0, Directories = 1, Files = 2 };
This enum
serves as a selector whether only directories, only files or both should be returned.
The Enumerate
method calls FindFirstFile
and subsequently iterates through all other file system entries by calling FindNextFile
. If entryType = Files
, it will add all the files in the retValue
list. For Directories
, it will add only directories, and for All
it will add both.
The method always searches for all the filesystem entries (searchOption = "*"
), and a Regex
(regular expression) object takes care of filtering which files and/or folders should actually be returned. This was an upgrade to the first version of the method which was supplying the path + searchPattern
arguments to the FindFirstFile
API function, unfortunately this proved to work only for *.* searches and not for specific files (ie. *.jpg) with searchOption = AllDirectories
because there were no subfolders that matched this search pattern, and hence only the files from the top directory were getting returned.
The Enumerate
method is called recursively if searchOption = AllDirectories
. The results from all the (recursive) calls are gathered in one variable retValue
. I used a by-ref argument to pass the results from each recursive call because returning and concatenating list has proven to be very very slow - however, if anyone prefers to use the List return type from Enumerate
method, List.AddRange
method works equally fast.
In the end, each call's HANDLE for file search will be closed by calling method FindClose
.
Testing the code
I have created a small piece of code (Console app) for testing purposes - comparison to the System.IO.Directory methods (.NET standard version):
static void Main(string[] args)
{
Stopwatch sw = new Stopwatch();
string path = @"V:\MUSIC";
List<string> searchPatterns = new List<string>();
searchPatterns.Add("*.*");
searchPatterns.Add("*.mp3");
searchPatterns.Add("*.jpg");
searchPatterns.Add("Iron*");
searchPatterns.Add("Iron Maiden\\*.mp?");
searchPatterns.Add("IRON MAIDEN");
searchPatterns.Add("Iron Maiden\\*.jp?g");
List<SearchOption> searchOptions = new List<SearchOption>();
searchOptions.Add(SearchOption.AllDirectories);
searchOptions.Add(SearchOption.TopDirectoryOnly);
List<Func<string, string, SearchOption, IEnumerable<string>>> funcs =
new List<Func<string, string, SearchOption, IEnumerable<string>>>();
funcs.Add(DirectoryAlternative.EnumerateFiles);
funcs.Add(Directory.EnumerateFiles);
funcs.Add(DirectoryAlternative.EnumerateDirectories);
funcs.Add(Directory.EnumerateDirectories);
funcs.Add(DirectoryAlternative.EnumerateFileSystemEntries);
funcs.Add(Directory.EnumerateFileSystemEntries);
IEnumerable<string> list;
int cnt;
System.Reflection.MethodInfo mi;
Console.WriteLine("METHOD MODULE SEARCHPATTERN SEARCHOPTION TIME COUNT");
Console.WriteLine("=====================================================================================================================");
foreach (string searchPattern in searchPatterns)
{
foreach (SearchOption searchOption in searchOptions)
{
foreach (Func<string, string, SearchOption, IEnumerable<string>>
func in funcs)
{
sw.Restart();
list = func(path, searchPattern, searchOption);
cnt = list.Count();
sw.Stop();
mi = System.Reflection.RuntimeReflectionExtensions.GetMethodInfo(func);
Console.WriteLine(Wrap(mi.Name, 19) + " "
+ Wrap(mi.Module.Name, 29) + " "
+ Wrap(searchPattern, 19) + " "
+ Wrap(searchOption == SearchOption.TopDirectoryOnly ?
"root" : "all", 19) + " "
+ Wrap(sw.ElapsedMilliseconds.ToString("N0") + "ms", 19) + " "
+ cnt.ToString());
Console.ReadKey();
}
}
}
Console.WriteLine();
Console.WriteLine("THE END!!!");
Console.ReadKey();
}
static string Wrap(string str, int len)
{
if (str.Length > len)
return "..." + str.Substring(str.Length - len + 3, len - 3);
else
return str.PadRight(len);
}