Contents
CEnum
is a fast and simple class that lets you enumerate (i.e. make a list of) all files in a directory.
It supports:
- the use of wild card characters * and ? (i.e. it's a file globbing class too)
- separate enumeration of files and subdirectories
- separate include list, exclude list and ignore case option for both files and directories
- recursive search (i.e. it can enumerate content of subdirectories too)
- enumeration using either file's full path or just file name
- written in STL, but it optionally supports MFC collection classes
- UNICODE aware (in Microsoft world UNICODE stands for UTF-16 little-endian)
- works as a wrapper around
::FindFirstFile
and ::FindNextFile
CEnum
is simplicity itself:
CEnum enumerator;
enumerator.bRecursive = true; enumerator.bFullPath = true;
enumerator.EnumerateAll(_T("C:\\"));
list<string> * pAllFilesOnCDrive = enumerator.GetFiles();
list<string>::iterator iter = pAllFilesOnCDrive->begin();
for (; iter != pAllFilesOnCDrive->end(); ++iter)
{
cout << iter->c_str();
}
What if you hate STL with a passion, and want to use MFC containers?
No worries, CEnum
is your friend too:
- At the top of enum.cpp uncomment this line:
//#define MFC
- Recompile
CEnum enumerator;
enumerator.sIncPatternFiles = _T("*.mp3");
enumerator.bNoCaseFiles = true; enumerator.bRecursive = true;
enumerator.bFullPath = true;
enumerator.EnumerateAll(_T("C:"));
CStringArray * pCStringArray = enumerator.GetFilesAsCStringArray();
for(int i=0; i<pCStringArray->GetSize(); ++i)
{
cout << (LPCTSTR) pCStringArray->GetAt[i];
}
CStringList * pCStringList = enumerator.GetFilesAsCStringList();
POSITION pos = pCStringList->GetHeadPosition();
while (pos != NULL)
{
cout << (LPCTSTR) pCStringList->GetNext(pos);
}
Note that, for your convenience, the path to root folder may or may not end with backslash!
This is already enough knowledge for your basic use of CEnum
, for details read on.
A few things inspired me to write this class:
- The fact that I work with files all the time, and I always copy/paste the same code over and over again (and somehow I always have a bug in it :-))
- The fact that I couldn't find a similar class just by using Google (don't laugh, I hate browsing through huge indexes of popular source code pages)
- The work of Jack Handy [^] and Alessandro Felice Cantatore [^]
- C#
Directory
class and its method GetFiles()
CEnum
was designed with the following in mind:
- must work out of the box (i.e. just make an object, call one method and you have the list of all files in that directory)
- must be designed using STL, not MFC
- must support globbing (wild card search)
- must support UNICODE
- must be able to ignore case when comparing strings
- must take care of memory management, so that user's calling function doesn't have to allocate or free any STL objects on its own
Here is the class:
User can set a number of options through public
member variables (for detailed description of each option, see section Class members).
typedef basic_string<TCHAR> _stl_string;
class CEnum
{
public:
_stl_string sExcPatternDirs; _stl_string sExcPatternFiles;
_stl_string sIncPatternDirs;
_stl_string sIncPatternFiles;
bool bRecursive;
bool bFullPath;
bool bNoCaseDirs;
bool bNoCaseFiles;
};
Default constructor will initialize variables with most common values:
public:
CEnum()
{
plstDirs = new list<_stl_string >; plstFiles = new list<_stl_string >;
sExcPatternDirs = _T("");
sExcPatternFiles = _T("");
sIncPatternDirs = _T("");
sIncPatternFiles = _T("");
bRecursive = false;
bFullPath = false;
bNoCaseDirs = false;
bNoCaseFiles = false;
}
If you wish, you can set user options directly in overloaded constructor:
CEnum
(
_stl_string sPath,
_stl_string sExcludePatternDirs = _T(""),
_stl_string sExcludePatternFiles = _T(""),
_stl_string sIncludePatternDirs = _T(""),
_stl_string sIncludePatternFiles = _T(""),
bool bRecursiveSearch = false,
bool bUseFullPath = false,
bool bIgnoreCaseDirs = false,
bool bIgnoreCaseFiles = false
)
{
plstDirs = new list<_stl_string >;
plstFiles = new list<_stl_string >;
sExcPatternDirs = sExcludePatternDirs;
sExcPatternFiles = sExcludePatternFiles;
sIncPatternDirs = sIncludePatternDirs;
sIncPatternFiles = sIncludePatternFiles;
bRecursive = bRecursiveSearch;
bFullPath = bUseFullPath;
bNoCaseDirs = bIgnoreCaseDirs;
bNoCaseFiles = bIgnoreCaseFiles;
EnumerateAll(sPath);
}
Variety of lists CEnum
can give you:
Note: By default, MFC collection classes are excluded from the build.
list<_stl_string > * GetDirs();
list<_stl_string > * GetFiles();
#ifdef MFC
CStringArray * GetDirsAsCStringArray();
CStringArray * GetFilesAsCStringArray();
CStringList * GetDirsAsCStringList();
CStringList * GetFilesAsCStringList();
#endif
There are two constructors for greater flexibility when using CEnum
:
CEnum enumerator;
enumerator.sExcPatternDirs = _T("Post Black album Metallica");
enumerator.sIncPatternDirs = _T("Iron Maiden");
enumerator.sExcPatternFiles = _T("*.wma");
enumerator.sIncPatternFiles = _T("*.mp3;*.ogg");
enumerator.bRecursive = true;
enumerator.bFullPath = true;
enumerator.bNoCaseDirs = true;
enumerator.bNoCaseFiles = true;
enumerator.EnumerateAll(_T("D:\\Music"));
list<string> * pQualityMetal = enumerator.GetFiles();
... or the same thing with fewer lines:
CEnum enumerator(
_T("D:\\Music"),
_T("Post Black album Metallica"),
_T("*.wma"),
_T("Iron Maiden"),
_T("*.mp3;*.ogg"),
true,
true,
true,
true
);
list<string> * pQualityMetal = enumerator.GetFiles();
... or the really simple way (if default values work for you):
CEnum enumerator( _T("D:\\Music") );
list<string> * pQualityMetal = enumerator.GetFiles();
Let me quote the very first book I read about C++, Jesse Liberty's "Teach yourself C++ in 21 days": "If you are writing a function that needs to create memory and then pass it back to the calling function, consider changing your interface. Have the calling function allocate the memory and then pass it into your function by reference. This moves all memory management out of your program and back to the function that is prepared to delete it."
Why did I not follow this great advice? Well firstly, this was designed to be similar to C# class Directory
(and in C# garbage collector takes care of memory management, thus the caller does not care about memory issues). And secondly, I wanted a class that would be easy to use. Remember, CEnum
's job is simply to create a list of file names.
So, here is how things work in CEnum
:
- Constructor allocates two lists
- After enumeration process ends, pointers to two lists are returned to calling function
- Pointer(s) to list(s) are deleted in destructor of
CEnum
This means that your enumeration list(s) will live only as long as the life time of CEnum
object that created it !!! You cannot pass the pointer to some other function. If you would like to do that, you either need to create a copy of the list, or comment out some (or all) delete statements in destructor.
Is this a bad design? It just might be, but simplicity of use was my primary issue when I designed this class. I wanted to be able to enumerate directory in a single line (see example above), do something with those files (e.g. display them on the screen) and then to forget about CEnum
object and all the lists it allocated. If you want to do it 'properly' CEnum
can be easily adapted to use lists allocated by client application, all that is needed is to add one more constructor and to comment out few lines in destructor.
This is the list of CEnum
's public
member variables through which user can set desired options:
bRecursive
Description: if true
, subdirectories will be enumerated too
Default value: false
bFullPath
Description: if true
, files will be enumerated using file's full path, otherwise list will contain file names only
Default value: false
bNoCaseDirs
Description: if true
, case will be ignored when searching directory (and only directory) names
Default value: false
bNoCaseFiles
Description: if true
, case will be ignored when searching file (and only file) names
Default value: false
sIncPatternFiles
Description: matching pattern for files you wish to include in your search. Wild cards * and ? are supported. If you have more than one search pattern, separate them with semicolon.
Default value: empty string
Examples:
- "*.mp3;*.mp4"
- "*.mp?" (same as first example)
- "*.mp3;iron maid*;latest*"
Note that in case of Include patterns, empty string means "enumerate all", i.e. everything is included !!
sExcPatternFiles
Description: matching pattern for files you wish to exclude from your search. Wild cards * and ? are supported. If you have more that one search pattern, separate them with semicolon.
Default value: empty string
Examples:
- "*.mp3;*.mp4"
- "*.mp?" (same as first example)
- "*.mp3;iron maid*;latest*"
Note that in case of Exclude patterns, empty string means "enumerate none", i.e. nothing is excluded!!
Also, in case of conflict, Exclude pattern has precedence over Include pattern. sIncPatternDirs
same as sIncPatternFiles
above, just for directories. sExcPatternDirs
same as ExcPatternFiles
above, just for directories.
CEnum
can give you a variety of lists:
CEnum enumerator(_T("C:\\"));
list<string> * pSTLlistFiles = enumerator.GetFiles();
list<string> * pSTLlistDirs = enumerator.GetDirs();
StringList * pMFClistFiles = enumerator.GetFilesAsCStringList();
StringList * pMFClistDirs = enumerator.GetDirsAsCStringList();
StringArray * pMFCArrayFiles = enumerator.GetFilesAsCStringArray();
StringArray * pMFCArrayDirs = enumerator.GetDirsAsCStringArray();
Since CEnum
was written using STL, the only two lists created during the execution are two lists returned by GetDirs()
and GetFiles()
functions. All four MFC containers (two CStringArrays
and two CStringLists
) are created only when you call conversion functions (GetFilesAs...
and GetDirsAs...
). In fact, all MFC related stuff is hidden behind preprocessor directives and is by default inactive (i.e. it does not compile). If you need this functionality, then just uncomment //#define MFC
line and recompile.
This is another thing that is needed often but not found easily. Two good examples are work of Jack Handy here on CodeProject [^] and the work of Alessandro Felice Cantatone [^]. Both are great examples, but each has its shortcomings. Jack's function is simple and fast, but it doesn't let you ignore case as STL's tolower
and UNICODE don't match well, and Alessandro's function was designed for IBM OS/2 (not to mention his holier than thou attitude, and don't even get me started on some of the restrictions he made. What has AI got to do with string
comparing?)
Anyway, here is what I came up with:
_tsetlocale(LC_ALL, _T("");
bool CompareStrings(LPCTSTR sPattern, LPCTSTR sFileName, bool bNoCase)
{
TCHAR temp1[2] = _T("");
TCHAR temp2[2] = _T("");
LPCTSTR pStar = 0;
LPCTSTR pName = 0;
while(*sFileName)
{
switch (*sPattern)
{
case '?':
++sFileName; ++sPattern;
continue;
case '*':
if (!*++sPattern) return 1;
pStar = sPattern;
pName = sFileName + 1;
continue;
default:
if(bNoCase)
{
*temp1 = *sFileName;
*temp2 = *sPattern;
if (!_tcsicmp(temp1, temp2)) {
++sFileName;
++sPattern;
continue;
}
}
else if (*sFileName == *sPattern) { ++sFileName;
++sPattern;
continue;
}
if(!pStar) return 0;
sPattern = pStar;
sFileName = pName++;
continue;
}
}
while (*sPattern == '*') ++sPattern;
return (!*sPattern);
}
This should be easy to follow:
- If ? is found in pattern string, chars match, and function moves to the next char in both pattern string and search string.
- If * is found in pattern string, function moves to the next char in pattern string only. Exits if there are no more chars in pattern string, or saves the record of current position (one char after '*') and the record of next char in search string.
- When two chars need to be compared regardless of case, run-time library
_tcsicmp
is used:
- if strings (temp strings that contain just one character) are equal, function moves to the next char in both pattern string and search string.
- if strings differ, function will return
false
if there was no '*' character in pattern string up to this point. Otherwise it will go back to position of last '*' character and advance by one char in search string.
Regarding the use of _tcsicmp
:
There was no special reason why I have chosen to use this function, other than the fact that it was the only run-time library routine that passed all my tests for UNICODE string comparing regardless of case.
Note: My tests were limited to European-character sets (stuff like accented, Central European and Nordic characters). For anything beyond that, you will have to test for yourself.
If you would like to use Wildcard compare function in some other project, and you don't need to ignore case, then you can greatly speed things up by using something like this:
bool CompareStrings(LPCTSTR sPattern, LPCTSTR sFileName)
{
LPCTSTR pStar = 0;
LPCTSTR pName = 0;
while(*sFileName)
{
switch (*sPattern)
{
case '?':
++sFileName; ++sPattern;
continue;
case '*':
if (!*++sPattern) return 1;
pStar = sPattern;
vpName = sFileName + 1;
continue;
default:
if (*sFileName == *sPattern) { ++sFileName; ++sPattern; continue; }
if(!pStar) return 0;
sPattern = pStar;
sFileName = pName++;
continue;
}
}
while (*sPattern == '*') ++sPattern;
return (!*sPattern);
}
This version is 3-4 times faster because it compares chars directly.
If you find that CEnum
lacks some useful feature, check these projects out. They may be closer to what you need.
Enumeration and globbing:
Wildcard match:
CEnum
used in demo project has one minor difference. Because I added Wildcard testing functionality in the demo project, CompareStrings
function is declared as public
and static
. Otherwise it is a private
non-static
method of CEnum
. - Testing enumeration
This part is very simple. In OnEnumeration()
function in a mere 20 lines of code directory is enumerated and its content is added to CListCtrl
. - Testing wildcard comparing functionality
Demo uses test.txt test file in which you can add your own test cases. Format of test.txt is very simple. First string is wildcard string, second string is search string and the last string specifies if first two strings match. Comment lines start with '#' character.
CEnum
is file-centric, meaning it was designed to search for files moreso than to search for directories.
For example:
- If you are searching only for files or only for directories, you will enumerate either just the way you intended.
- But, if you apply filters (exclude or include) for both files and directories, then you will enumerate only those files that reside in directories that match the filter applied for directories.
Depending on how you look at things, the latter case might be seen as a limitation, because you can't perform independent search for files and directories. You can't get a list of all files in all subdirectories, and at the same time get a list of directories that match certain search criteria. In this case, the only thing you can do is to run CEnum
twice, once for files and once for directories.
Another thing that user needs to be aware of is the sorting functionality. After enumerating each directory, CEnum
calls STL list's sort()
method. This method sorts files in alphabetic order, which may be different sorting order than in your file browser (e.g. Windows Explorer).
Finally, be aware that CEnum
was designed for globbing (wildcard search) only. It can't search for files based on size, date, file content or file attributes.
Possible compile error in Visual studio is:
- fatal error C1010: unexpected end of file while looking for precompiled header. Did you forget to add '
#include "stdafx.h
"' to your source?
Solution is not to use precompiled headers:
- In the Solution Explorer pane of the project, right-click the project name, and then click Properties.
- In the left pane, click the C/C++ folder.
- Click the Precompiled Headers node.
- In the right pane, click Create/Use Precompiled Header, and then click Not Using Precompiled Headers.
- Version 1.0 published 05/2008 - Initial public release
- Version 1.1 published 11/2008
List of changes in version 1.1
Article:
- Added Destructor chapter
- Updated Wildcard search algorithm chapter
- Added Doxygen documentation
- Changed demo application so files could be opened by double-clicking
Source code:
- Removed most of dynamic allocations od STL objects
- Added Doxygen Qt style comments to CEnum.cpp
- Changed Wildcard compare algorithm in
CompareStrings
function - Rewrote function
Tokenize
- Updated class' Destructor
- Verified that class compiles cleanly (no error or warnings) at warning level 4 in Visual Studio 2005
There is no limitation on use of this class. You can use it (as whole or just parts of it) with or without author's permission in any sort of project regardless of license issues, in both commercial and open source projects. Though, a thank you e-mail would be nice. :-)
That's it folks, a simple class that lets you enumerate files without understanding how ::FindFirstFile
and ::FindNextFile
API works, or even how CEnum
internally works. I hope CEnum
will speed things up for you when working with files as much as it did for me.
I'd love to get some feedback from you, be it constructive criticism, request for additional features or bug reports. Don't hesitate to post your comments or send them in e-mail. I am especially interested if you know of a similar class or project. In case you do, just let me know where I can find it.