Introduction
This algorithm came about when I was in need of a quick way of finding files in a list that had the same extension, and then remove them from the list of extensions to be compared/parsed by a for-looped system, and then only the extensions that were supported to be returned via a ref
(this is %
in Managed C++) variable back into the main processing line of my program (a media player).
Background
The code and ideas I am sharing in this article are to do with how this sort of complex comparison can be done quickly, and without too-many variables / for-loops. It was needed to be speedy as it was being used in a wrapper in an media player application I was writing, and it happened every time new file(s) were opened. The application specs I wrote specified that it should not hang at this stage unless there was an extremely good reason for it to, and parsing the file extensions is not, in this case, a good reason for the application to hang.
Using the code
There is really only one method to this algorithm - a function called ScanFileExts
.
The definition of the function can be found in the file "ExtParseAlgorithm.h" (in the accompanying source zip file). It looks like:
bool ScanFileExts(array<String^>^ Files, array<String^>^ ComparisonExts,
array<String^>^% retExts);
This accepts two input arguments: Files
- the files (with extensions) to parse, and ComparisonExts
- which speaks for itself, the extensions to use in the parsing process.
The third argument - retExts
- is used to return the extensions that are supported and in the original file list.
The actual definition of the implementation is found in the top of the next file, and looks the same as the above, but without the ";", and it has a comment describing the return behavior.
In the other file with the name "ExtParseAlgorithm", which is a C++ source file, resides the actual implementation of the function.
The first section (stage) in this code is:
if (Files->Length == 0) return false;
array<String^>^ ret = gcnew array<String^>(0);
array<String^>^ LoopFileExts = gcnew array<String^>(Files->Length);
for (int i = 0; i < Files->Length; i++)
{
String^ Ext = "";
for (int j = (Files[i]->Length - 1); j > 0; j--)
{
if (Files[i][j] == '.')
{
Ext = Files[i]->Substring(j);
j = 0;
}
}
LoopFileExts[i] = Ext;
}
array<String^>^ internFiles = (array<String^>^)LoopFileExts->Clone();
This tests if the file list you have provided is empty, if so, returns false
for an error, because this function should not have empty parameters. Then, it initializes two internal variables: ret
and LoopFileExts
.
The code then does an initial collection of extensions from the files in that list; it puts them into another variable - internFiles
. For behavioral aspects with this code that might need to be addressed, see "Points of interest".
The next stage of code does the first pass of parsing / comparing, the code goes as follows:
array< array<bool^>^ >^ Matches = gcnew array< array<bool^>^ >
(Files->Length);
for (int i = 0; i < Files->Length; i++)
{
array<bool^>^ Matched;
Matched = gcnew array<bool^>(Files->Length);
for (int j = 0; j < Files->Length; j++)
{
if (j != i)
{
if (LoopFileExts[i] == LoopFileExts[j]) Matched[j] = true;
else Matched[j] = false;
}
}
Matched[i] = false;
Matches[i] = (array<bool^>^)Matched->Clone();
}
This code starts by initializing an array of boolean (bool
as declaring type) arrays to the number of files in the file array. The next step is to start stepping through this array and filling it; this is achieved by making a temporary variable called Matched
and setting each element to true
if the extension of the current file in the list (indicated by i
) is the same as that of the file at j
. j
is the variable created by running another loop through the files, and then comparing the number in j
against i
, if they are different, then saying if they are the same or not. The current file indicated by i
is set to false
in the array; this means that when the file array is cleaned, it is not deleted if it the first occurrence of the extension.
After this, we need to shorten the file list; this is achieved with the following:
for (int i = 0; ; i++)
{
array<String^>^ FileList = gcnew array<String^>(0);
int CurrentFileListLen = internFiles->Length;
for (int j = 0; j < CurrentFileListLen; j++)
{
if (Matches[i][j]->ToString() == bool::FalseString)
{
FileList->Resize(FileList, (FileList->Length + 1));
FileList[(FileList->Length - 1)] = LoopFileExts[j];
}
}
internFiles->Resize(internFiles, 0);
internFiles->Resize(internFiles, FileList->Length);
internFiles = (array<String^>^)FileList->Clone();
break;
}
Using the array previously created, it loops through, again using that double for
-loop setup, removing extraneous entries in the file list.
Right, enough chasing the file list and cleaning it up. Now, to do the second pass of parsing that I mentioned earlier this code did, the code goes as follows:
for (int i = 0; i < internFiles->Length; i++)
{
for (int j = 0; j < ComparisonExts->Length; j++)
{
if (ComparisonExts[j][(ComparisonExts[j]->Length - 1)] == '\0')
{
ComparisonExts[j] = ComparisonExts[j]->Remove((ComparisonExts[j]->Length - 1));
}
if (ComparisonExts[j] == internFiles[i])
{
ret->Resize(ret, (ret->Length + 1));
ret[(ret->Length - 1)] = (String^)ComparisonExts[j]->Clone();
}
}
}
There, now what does this do? Well, first off, using that double for
-loop system that I keep using (Some must be wondering if this is my favourite for-loop setup. Well, it is not, I prefer to use single for-loops if any, and it is actually the one I hate the most. The reason it is used here is because it does the job in hand better than other systems, like do-loops, and while-loops, oh, and my favourite setup of for-loops, singular for-loops), we start the comparison / parse process.
Yet another array is created here, or should I say, the return array we created earlier and initialized to null
is resized and the newly created index filled with an extension if the extension matches the one in the file list.
The next bit of code I am not going to bother with, as it only adds some verbosity to the application, and can be removed without the algorithm braking, but I will bother with the bit after it, and that is:
retExts = ret;
return true;
This code is simply putting the newly created and cleaned file extension list into that variable in the header that got mentioned as the only return variable: retExts
.
The only reason this function is non-void
in the return type is because I needed some way of determining whether or not the function succeeded in doing the parse. It will return true
for succeeded, at this point.
When you wish to call this algorithm, you will need to have an un-initialized variable to recieve the data, and two variables with the type array<String^>^
filled with the appropriate data; an example of this is included in the source, and is compiled in the demo. The source file for this is "Main.cpp", it keeps in line with the style and spec I set with the media player code.
Points of interest
Within the initial collection, it counts backward till it hits the "." of the start of an extension. If you wish to use this on files without an extension, it will throw an error by not putting anything into the Ext
temporary variable; instead, it will be null
which causes .NET to throw the null variable exception when the clone happens. (Please do not do this as it is a tough to find error if you use it as I do - inside a pre-compiled library (.dll file) - that is being called into, and you cannot use a debug version because it does not even work one bit).
History
- [19 July 2007] - Initial article.
- [23 January 2009] - Updated the downloads to something that can be read by normal Windows users.