Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#

DirectoryInfo.GetFiles Returns More Files Than Expected

5.00/5 (7 votes)
3 Feb 2011CPOL2 min read 31.2K   1  
How to get exactly what you need using DirectoryInfo.GetFiles, with an exact extension match lookup

Introduction

I had not noticed this behavior of the GetFiles() method until now, I must admit. It’s something not frequently seen, but might happen. And it’s dangerous.

As this post, and the MSDN library itself states, when you use the GetFiles() method with a search wildcard that includes the asterisk symbol, and you include a three character long extension (like *.xml, or *.jpg), the GetFiles() method will return any file whose extension starts with the one you provided. That means that a search for *.jpg will return anything with extensions like: *.jpg, *.jpg2, *.jpegfileformat, etc.

This is quite a weird behavior (and not too elegant, I should say), introduced to support the 8.3 file name format. As stated in the above mentioned blog:

“A file with the name “alongfilename.longextension” has an equivalent 8.3 filename of “along~1.lon”. If we filter the extensions “.lon”, then the above 8.3 filename will be a match.”

That’s the reason to make the GetFiles() method behave that way. The official MSDN explanation:

When using the asterisk wildcard character in a searchPattern (for example, "*.txt"), the matching behavior varies depending on the length of the specified file extension. A searchPattern with a file extension of exactly three characters returns files with an extension of three or more characters, where the first three characters match the file extension specified in the searchPattern. A searchPattern with a file extension of one, two, or more than three characters returns only files with extensions of exactly that length that match the file extension specified in the searchPattern. When using the question mark wildcard character, this method returns only files that match the specified file extension. For example, given two files in a directory, "file1.txt" and "file1.txtother", a search pattern of "file?.txt" returns only the first file, while a search pattern of "file*.txt" returns both files.

In my case, I had a bug in my software because I temporally renamed an XML file to xxx.XML2222, just to wipe it out of the application. The program was still reading it, what made it have a wrong behavior.

A Workaround for this Issue

If you want to prevent this behavior, you will need to do a manual check for the returned array of FileInfo classes, to remove those not matching your pattern. An elegant way to do so is to write a MethodExtender to the DirectoryInfo class, like the following one:

C#
/// <summary>
/// Returns array of files that matches the search
/// wildcard, but with an exact match for the extension. 
/// </summary>
/// <param name="pSearchWildcard">Search wildcard,
/// in the format: *.xml or file?.dat</param>
/// <returns>Array of FileInfo classes</returns>
public static FileInfo[] GetFilesByExactMatchExtension(
       this DirectoryInfo dinfo, string pSearchWildcard)
{
     FileInfo[] files = dinfo.GetFiles(pSearchWildcard);
     if (files.Length == 0)
         return files;

     string extensionSearch = 
       Path.GetExtension(pSearchWildcard).ToLowerInvariant();
     List<FileInfo> filtered = new List<FileInfo>();
     foreach (FileInfo finfo in files)
     {
         if (finfo.Extension.ToLowerInvariant() != extensionSearch)
             continue;
         filtered.Add(finfo);
     }
     return filtered.ToArray();
}

This way, just by using the regular GetFiles() method of the DirectoryInfo class, you will now find the brand new GetFilesByExactMatchExtension(), which will have the desired behavior.

Note: In order to be able to use this method in a class, just like any other MethodExtender, you will need to include a “using” statement to the extension method’s namespace.

Hope it helps!

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)