Introduction
I had not noticed this behavior of the GetFiles()
method until now, I must admit. It’s something not frequently seen, but might happen. And it’s dangerous.
As this post, and the MSDN library itself states, when you use the GetFiles()
method with a search wildcard that includes the asterisk symbol, and you include a three character long extension (like *.xml, or *.jpg), the GetFiles()
method will return any file whose extension starts with the one you provided. That means that a search for *.jpg will return anything with extensions like: *.jpg, *.jpg2, *.jpegfileformat, etc.
This is quite a weird behavior (and not too elegant, I should say), introduced to support the 8.3 file name format. As stated in the above mentioned blog:
“A file with the name “alongfilename.longextension” has an equivalent 8.3 filename of “along~1.lon”. If we filter the extensions “.lon”, then the above 8.3 filename will be a match.”
That’s the reason to make the GetFiles()
method behave that way. The official MSDN explanation:
When using the asterisk wildcard character in a searchPattern (for example, "*.txt"), the matching behavior varies depending on the length of the specified file extension. A searchPattern with a file extension of exactly three characters returns files with an extension of three or more characters, where the first three characters match the file extension specified in the searchPattern. A searchPattern with a file extension of one, two, or more than three characters returns only files with extensions of exactly that length that match the file extension specified in the searchPattern. When using the question mark wildcard character, this method returns only files that match the specified file extension. For example, given two files in a directory, "file1.txt" and "file1.txtother", a search pattern of "file?.txt" returns only the first file, while a search pattern of "file*.txt" returns both files.
In my case, I had a bug in my software because I temporally renamed an XML file to xxx.XML2222, just to wipe it out of the application. The program was still reading it, what made it have a wrong behavior.
A Workaround for this Issue
If you want to prevent this behavior, you will need to do a manual check for the returned array of FileInfo
classes, to remove those not matching your pattern. An elegant way to do so is to write a MethodExtender
to the DirectoryInfo
class, like the following one:
public static FileInfo[] GetFilesByExactMatchExtension(
this DirectoryInfo dinfo, string pSearchWildcard)
{
FileInfo[] files = dinfo.GetFiles(pSearchWildcard);
if (files.Length == 0)
return files;
string extensionSearch =
Path.GetExtension(pSearchWildcard).ToLowerInvariant();
List<FileInfo> filtered = new List<FileInfo>();
foreach (FileInfo finfo in files)
{
if (finfo.Extension.ToLowerInvariant() != extensionSearch)
continue;
filtered.Add(finfo);
}
return filtered.ToArray();
}
This way, just by using the regular GetFiles()
method of the DirectoryInfo
class, you will now find the brand new GetFilesByExactMatchExtension()
, which will have the desired behavior.
Note: In order to be able to use this method in a class, just like any other MethodExtender
, you will need to include a “using
” statement to the extension method’s namespace.
Hope it helps!