|
Explanation in MSDN Magazine, sept 09 :
"To address the second issue, DirectoryInfo now makes use of data that the operating system already provides from the file system during enumeration. The underlying Win32 functions that Directory-Info calls to get the contents of the file system during enumeration actually include data about each file, such as the length and creation time. We now use this data when initializing the FileInfo and DirectoryInfo instances returned from both the older array-based and new IEnumerable<t>-basedmethods on DirectoryInfo. This means that in the preceding code, there are no additional underlying calls to the file system to retrieve the length of the file when file.Length is called, since this data has already been initialized."
|
|
|
|
|
have a 5
TVMU^P[[IGIOQHG^JSH`A#@`RFJ\c^JPL>;"[,*/|+&WLEZGc`AFXc!L
%^]*IRXD#@GKCQ`R\^SF_WcHbORY87֦ʻ6ϣN8ȤBcRAV\Z^&SU~%CSWQ@#2
W_AD`EPABIKRDFVS)EVLQK)JKQUFK[M`UKs*$GwU#QDXBER@CBN%
R0~53%eYrd8mt^7Z6]iTF+(EWfJ9zaK-iTV.C\y<pjxsg-b$f4ia>
-----------------------------------------------
128 bit encrypted signature, crack if you can
|
|
|
|
|
I did the similar coding several months ago using Windows API FindFirstFile/FindNextFile and Directory.GetFiles, and compared the results of two methods on a large number of files over network. On the first run, two methods did not make much difference. But ran the program again, FindFirstFile/FindNextFile was faster. You may simulate it using author's code: Click "FastDirectoryEnumerator.EnumerateFiles" several times and compare the results; or exit and run the program again and click "FastDirectoryEnumerator.EnumerateFiles", then compare the results.
|
|
|
|
|
Just to be clear the new v4 EnumerateFiles method is not in any way faster than using GetFiles. It uses the exact same process of calling FindFirstFile/FindNextFile as GetFiles. Therefore you have a roundtrip for each file. What makes it better is the perceived speed.
With GetFiles you have to wait for the framework to enumerate all the files before you get any results back. For large #s of files this can be really slow. EnumerateFiles uses an iterator so each time you request the next file it makes the roundtrip to fetch the next file. Therefore each iteration the performance is consistent (theoretically) irrelevant of the # of files. Of course the overhead of the iterator means that it will actually take longer overall but (like threading) you won't have the hefty delay.
This actually has some implications to how you code. Before if you tried to enumerate a directory of files and one of the files had security that prevented you from reading it then you'd get an exception and lose all files. Now you'll get an exception during the iteration. Another place where things behave differently is in the results. If you use GetFiles then you'll get the list of files available while the method runs. Now you'll potentially (read: depending upon the FindNextFile impl) get the files that were added after the initial call but before the iterator gets to the file.
|
|
|
|
|
I gave this code a try and I got this exception.
Any ideas how I can workaround this exception.
Thanks,
Quan
======= Unit Tests =======
[Test]
public void TestGetAllFilesFromCDrive()
{
foreach (FileData file in FastDirectoryEnumerator.GetFiles(@"C:\", "*", SearchOption.AllDirectories))
{
Console.WriteLine("Name: {0}, Size: {1}", file.Name, file.Size);
}
}
======= Exception =======
FileEnumerationTests.TestGetAllDirectoryFilesFromCDrive : FailedSystem.IO.PathTooLongException: The specified path, file name, or both are too long. The fully qualified file name must be less than 260 characters, and the directory name must be less than 248 characters.
at System.IO.Path.SafeSetStackPointerValue(Char* buffer, Int32 index, Char value)
at System.IO.Path.NormalizePathFast(String path, Boolean fullCheck)
at System.IO.Path.NormalizePath(String path, Boolean fullCheck)
at System.IO.Path.GetFullPathInternal(String path)
at System.Security.Util.StringExpressionSet.CanonicalizePath(String path, Boolean needFullPath)
at System.Security.Util.StringExpressionSet.CreateListFromExpressions(String[] str, Boolean needFullPath)
at System.Security.Permissions.FileIOPermission.AddPathList(FileIOPermissionAccess access, AccessControlActions control, String[] pathListOrig, Boolean checkForDuplicates, Boolean needFullPath, Boolean copyPathList)
at System.Security.Permissions.FileIOPermission..ctor(FileIOPermissionAccess access, String path)
at CodeProject.FastDirectoryEnumerator.FileEnumerator.MoveNext() in FastDirectoryEnumerator.cs: line 473
at CodeProject.FastDirectoryEnumerator.FileEnumerator.MoveNext() in FastDirectoryEnumerator.cs: line 514
at CodeProject.FastDirectoryEnumerator.FileEnumerator.MoveNext() in FastDirectoryEnumerator.cs: line 494
at CodeProject.FastDirectoryEnumerator.FileEnumerator.MoveNext() in FastDirectoryEnumerator.cs: line 494
at CodeProject.FastDirectoryEnumerator.FileEnumerator.MoveNext() in FastDirectoryEnumerator.cs: line 514
at CodeProject.FastDirectoryEnumerator.FileEnumerator.MoveNext() in FastDirectoryEnumerator.cs: line 494
at CodeProject.FastDirectoryEnumerator.FileEnumerator.MoveNext() in FastDirectoryEnumerator.cs: line 494
at CodeProject.FastDirectoryEnumerator.FileEnumerator.MoveNext() in FastDirectoryEnumerator.cs: line 514
at CodeProject.FastDirectoryEnumerator.FileEnumerator.MoveNext() in FastDirectoryEnumerator.cs: line 494
at CodeProject.FastDirectoryEnumerator.FileEnumerator.MoveNext() in FastDirectoryEnumerator.cs: line 494
at CodeProject.FastDirectoryEnumerator.FileEnumerator.MoveNext() in FastDirectoryEnumerator.cs: line 514
at CodeProject.FastDirectoryEnumerator.FileEnumerator.MoveNext() in FastDirectoryEnumerator.cs: line 528
at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)
at CodeProject.FastDirectoryEnumerator.GetFiles(String path, String searchPattern, SearchOption searchOption) in FastDirectoryEnumerator.cs: line 251
at testFastDirectoryEnumeration.FileEnumerationTests.TestGetAllDirectoryFilesFromCDrive() in FileEnumerationTests.cs: line 39
|
|
|
|
|
Somewhere on your drive is a path + file name that is greater than 260 characters.
You can try prepending '\\?\' to your path string like this to enable very long file names (up to 32k):
[Test]
public void TestGetAllFilesFromCDrive()
{
foreach (FileData file in FastDirectoryEnumerator.GetFiles(@"\\?\C:\", "*", SearchOption.AllDirectories))
{
Console.WriteLine("Name: {0}, Size: {1}", file.Name, file.Size);
}
}
|
|
|
|
|
Got this exception if I do that
System.ArgumentException: Illegal characters in path.
at System.Security.Permissions.FileIOPermission.HasIllegalCharacters(String[] str)
at System.Security.Permissions.FileIOPermission.AddPathList(FileIOPermissionAccess access, AccessControlActions control, String[] pathListOrig, Boolean checkForDuplicates, Boolean needFullPath, Boolean copyPathList)
at System.Security.Permissions.FileIOPermission..ctor(FileIOPermissionAccess access, String[] pathList, Boolean checkForDuplicates, Boolean needFullPath)
at System.IO.Path.GetFullPath(String path)
at CodeProject.FastDirectoryEnumerator.EnumerateFiles(String path, String searchPattern, SearchOption searchOption) in FastDirectoryEnumerator.cs: line 229
at CodeProject.FastDirectoryEnumerator.GetFiles(String path, String searchPattern, SearchOption searchOption) in FastDirectoryEnumerator.cs: line 250
at testFastDirectoryEnumeration.FileEnumerationTests.TestGetAllFilesFromCDrive2() in FileEnumerationTests.cs: line 49
|
|
|
|
|
|
Here is my workaround for this issue
Modify MoveNext() function (line 472) as following
if (m_hndFindFile == null)<br />
{<br />
if (m_path.Length <= 260)<br />
{<br />
new FileIOPermission(FileIOPermissionAccess.PathDiscovery, m_path).Demand();<br />
}<br />
<br />
string fixPath = @"\\?\" + m_path;<br />
string searchPath = Path.Combine(fixPath, m_filter);<br />
m_hndFindFile = FindFirstFile(searchPath, m_win_find_data);<br />
retval = !m_hndFindFile.IsInvalid;<br />
}
Thanks,
Quan
|
|
|
|
|
That works, but at the obvious cost of callers will be able to use this class to bypass path discovery security for any path that is longer than 260 characters. If this dll is called from a location that should not have that permission (the web for example, or a network share pre-3.5 SP1), then this could lead to an information leak/security vunerability. Whether that is important in your application is up to you.
|
|
|
|
|
In the method GetFiles , SearchOption.TopDirectoryOnly is being used instead of searchOption .
public static FileData[] GetFiles(string path, string searchPattern, SearchOption searchOption)
{
IEnumerable<FileData> e = FastDirectoryEnumerator.EnumerateFiles(path, searchPattern, SearchOption.TopDirectoryOnly);
List<FileData> list = new List<FileData>(e);
FileData[] retval = new FileData[list.Count];
list.CopyTo(retval);
return retval;
}
Even when I change the code to use searchOption and pass in SearchOption.AllDirectories it doesn't work as expected. I don't get the same values returned as Directory.GetFiles . I get only the files in the top directory (if any).
Any ideas?
modified on Sunday, August 23, 2009 9:51 AM
|
|
|
|
|
Same here. The real performance savings of this routine would be with the AllDirectories option, but it doesn't work. Sure runs fast though!
I tried the same fix Corey did, with the same results... no change. I'm running WinXP Pro SP3.
Update: OK, it looks like the other line that needs changing is the test for FileAttributes.Directory in MoveNext , so change this:
if ((FileAttributes)m_win_find_data.dwFileAttributes == FileAttributes.Directory) to this:
if (((FileAttributes)m_win_find_data.dwFileAttributes & FileAttributes.Directory) == FileAttributes.Directory) and all is well. Still runs like lightning!
modified on Sunday, August 23, 2009 10:59 PM
|
|
|
|
|
Good catch. I've submitted new code with this fix, and it should be available shortly. Thanks for the help.
Update: New version is now posted.
modified on Thursday, August 27, 2009 8:44 AM
|
|
|
|
|
Recursion doesn't seem to be working at all now...
|
|
|
|
|
How so? I tried it on my machine and it works fine for me. Are you getting an exception or just not seeing all the files? Note that the overload EnumerateFiles with no parameters only enumerates the current directory. You need to explicitly call it with a SearchOption if you want to enumerate sub-directories.
|
|
|
|
|
I've done small amount of debugging. It seems the problem occurs if you specify a search pattern. Child folders aren't searched unless they have the same extension as the search pattern.
|
|
|
|
|
Any chance you've fixed this?
|
|
|
|
|
Thanks for pointing this out. I've submitted a fix for this issue. It may be a few days before it appears however.
EDIT: Update is now available.
modified on Thursday, September 10, 2009 10:26 AM
|
|
|
|
|
The article looks clean and the code is well documented - got my 5.
I was going to recommend implementing additional features to take advantage of predicates/actions much like a List does. Though after scanning your code, it might not be as simple to implement or really buy you much.
But either way, good stuff - thanks for sharing.
|
|
|
|
|
@wilsone8
Did you also look at FSO (FileSystemObject) enumeration?
I don't expect it to best your best times, but I was curious if you had tested other, out-of-the-box methods.
|
|
|
|
|
Good one. I never thought I needed this until I see it.
I really never look at the performance differences of using DirectoryInfo.GetFiles and GetDirectories, I always thought they were very small one, not such big ones.
Good article.
|
|
|
|
|
well done. Thinking outside the box, or the Framework. I like it.
Luc Pattyn [Forum Guidelines] [My Articles]
The quality and detail of your question reflects on the effectiveness of the help you are likely to get.
Show formatted code inside PRE tags, and give clear symptoms when describing a problem.
|
|
|
|
|
Luc Pattyn wrote: well done. Thinking outside the box, or the Framework. I like it.
If you close your eyes on some of the technical side of handling files in .NET, then you have a point.
>> ..but it suffers from some very poor performance characteristics:
>> 1. GetFiles must allocate a potentially very large array.
True that is why they are including the new method. Directory.EnumerateFiles.
>> 2. GetFiles must wait for the entire directory's entries to be returned before returning.
The same as 1, so same problem.
>> 3. 3.For each file, a potentially expensive query is sent to the file system. No attempt is made to perform any sort of batch query.
If that means; file size, file dates, then true. Other than that there is nothing special here.
This article boast of speed, but ignores one essential part of .NET file systems, security. There is no single attempt to demand/check file security, which is done extensively in the Directory.GetFiles.
>> Sadly, it will still only return file names...
Sadly, this is misinformation and lack of understanding...
1. Directory.EnumerateFiles() is designed to return only the file names, which is required by most applications, and will be the replacement of Directory.GetFiles(). This essentially prevent creating useless classes; FileInfo or FileData (if you prefer that).
2. DirectoryInfo.EnumerateFiles(), is designed for those wanting more information on the files, and it is the replacement of the DirectoryInfo.GetFiles, and unlike the DirectoryInfo.GetFiles(), this does not use the Directory.GetFiles internally.
Basically, this article like many codes out there, including the MSDN version, is to reduce the amount of memory required when dealing with large files, and knowing the environment you are using it.
I have used the MSDN version, removed the FileInfo it returns for just the file name, when creating a tool for Sandcastle.
Best regards,
Paul.
Jesus Christ is LOVE! Please tell somebody.
modified on Friday, August 14, 2009 6:33 AM
|
|
|
|
|
Paul Selormey wrote: If that means; file size, file dates, then true. Other than that there is nothing special here.
Getting additional attributes for each file is the whole point of this code! If all you need is file names, then of course this is not interesting. I'm pointing out that if you need file attributes and not just names, then the .Net Framework's built in methods are not the most efficient way to go.
Paul Selormey wrote: This article boast of speed, but ignores one essential part of .NET file systems, security. There is no single attempt to demand/check file security, which is done extensively in the Directory.GetFiles.
This is true in v1 of this article. I've added the same security checks that the .Net framework does and the performance doesn't change at all.
Paul Selormey wrote: 2. DirectoryInfo.EnumerateFiles(), is designed for those wanting more information on the files, and it is the replacement of the DirectoryInfo.GetFiles, and unlike the DirectoryInfo.GetFiles(), this does not use the Directory.GetFiles internally.
If it works anything like the current DirectoryInfo.GetFileSystemInfos , then like I said above it will be no faster than creating a bunch of FileInfo objects. Its not returning an enumeration that makes my method faster. In fact, just to prove this I've added a GetFiles method to the FastDirectoryEnumerator that returns an array of FileData objects. The two methods are always within 5% of each other.
The problem with DirectoryInfo.GetFileSystemInfos (or any of the methods that return a FileInfo object) is that internally the FileInfo objects only stores a file name on construction. All the other information that is returned by FindFirstFile/FindNextFile is thrown away and than re-queried when you request the first attribute of a file. By keeping that data around my code is significantly faster, especially in the face of network latencies. Maybe in .Net 4.0 this is changed (I don't have Beta 1 of 4.0 installed; I don't play around with Beta software), but in the mean time I think this code is useful for some people.
modified on Friday, August 14, 2009 2:53 PM
|
|
|
|
|
wilsone8 wrote: Getting additional attributes for each file is the whole point of this code!...
First of all, I do think this article is useful for many uses, and as I have said, I have used a similar code, which appeared in the MSDN mag.
wilsone8 wrote: This is true in v1 of this article. I've added the same security checks that the .Net framework does and the performance doesn't change at all.
Lets see how it performs, hope you will post the update soon. Without a security check/demand, and no word of caution users could run into problems.
wilsone8 wrote: I don't have Beta 1 of 4.0 installed; I don't play around with Beta software
Then, it was not right making statements about it, especially not making the difference between the two methods provided by the .NET and what they are supposed to do.
I have .NET 4.0 beta 1 installed on a VPC, and play with it, and with Reflector, it was easy to see how it works.
Even now, FileInfo/FileSystemInfo internally uses the Win32 data, what was lacking was the right iterator, and .NET 4.0 provides at least three of them, through a factory. Unlike the .NET 2.x, FileSystemInfo now has internal method, InitializeFrom(Win32Native.WIN32_FIND_DATA findData), so yes there is a difference.
wilsone8 wrote: ...but in the mean time I think this code is useful for some people.
It is useful, and I will use it when I have the need, just avoid the extra unverified information.
NB: If you are only looking for files use the DirectoryInfo.GetFiles instead of the DirectoryInfo.GetFileSystemInfos to avoid the extra
((files[i].Attributes & FileAttributes.Directory) == 0) checks.
Best regards,
Paul.
Jesus Christ is LOVE! Please tell somebody.
|
|
|
|
|