Introduction
One of my favorite features of Perforce is its recursive wildcard syntax.
I use it every day. Instead of navigating trees in a GUI or directories at
the Command Prompt
looking for a particular file to perform an operation, I merely use the Perforce '...
'
recursive syntax to find the file and check it out:
p4 edit ...MyFile.cpp
As further specialized functionality, the '...
' syntax can accept extensions,
too.
p4 edit ....h
This checks out all *.h
files across every directory under the current one.
I desired similar type of functionality in my own applications, so I began
searching the Internet for any existing code implementing this behavior. I
came across DJGPP's glob()
function (also in OpenBSD). Coupled with the
fnmatch()
POSIX
function, they made a great match, and it worked almost exactly how I wanted. The
only catch was the license. GPL is not conducive to many environments, and
despite the fact I release source code to most of my products, I can't afford to
have all the code I write fall under the GPL license.
So, it was back to searching. Next, I found Matthias Wandel's MyGlob
code, from his Exif Jpeg camera setting parser and thumbnail remover application
(see the Credits for URL). Its behavior was mostly what I
wanted, and it didn't fall under some license where you have to sell your soul.
In fact, it falls under no license and is freely usable.
After modifying a version of his code to my liking, I emailed him, told him
what I had done, and received his permission to post my modified version here.
In any case, Matthias wrote the original implementation. I slapped on a bunch of new features,
and I'll discuss the product as a whole below.
The newest versions of this code may be found at
http://workspacewhiz.com/ in the Misc.
Code section.
Globbing?
Frankly, I was surprised, too. My Internet search would have gone more
quickly had I known a "glob" was exactly what I was looking for. I'd
always thought a glob was that gooey stuff my roommates in college made for
dinner, but I guess I was sorely mistaken. :)
A file glob, in this case, is zero or more file names matched via a pattern, possibly with
wildcards embedded within.
Patterns and Wildcards
Without a path specified, matching (er, globbing) of files starts in the
current directory.
Wildcard |
Description |
? |
Matches any single character of the file name or directory
name. |
* |
Matches 0 or more characters of the file name or directory
name. |
/ at end of pattern |
Any pattern with a closing slash will start a directory
search, instead of the default file search. |
** |
Contrary to my wanting to use a Perforce-style '... '
recursive syntax, Matthias brought up an important point. Those
individuals using 4DOS (http://www.jpsoft.com/)
are used to '... ' meaning '..\..\ '. After
some thought, I believe Matthias's original '** ' syntax for
recursion is a far better solution. |
Some examples follow:
Example Pattern |
Description |
File.txt |
Matches a file or directory called File.txt. |
File*.txt |
Matches any file or directory starting with File and ending
with a .txt extension. |
File?.txt |
Matches any file or directory starting with File and
containing one more character. |
F??e*.txt |
Matches a file or directory starting with F, followed by any
two characters, followed by e, then any number of characters up
to the extension .txt. |
File* |
Matches a file or directory starting with File and ending
with or without an extension. |
* |
Matches all files (non-recursive). |
*/ |
Matches all directories (non-recursive). |
A*/ |
Matches any directory starting with A (non-recursive). |
** |
Matches all files (recursive). |
** |
Shortened form of above. Matches all files
(recursive). Internally, expands to **/* |
**/ |
Matches all directories (recursive). |
**{filename chars} |
Matches {filename chars} recursively. Internally,
expands to .../*{filename chars}. |
{dirname chars}** |
Expands to {dirname chars}*/**. |
{dirname chars}**{filename chars} |
Expands to {dirname chars}*/**/*{filename chars}. |
**.h |
Matches all *.h files recursively. Expands to **/*.h. |
**resource.h |
Matches all *resource.h files recursively.
Expands to .../*resource.h. |
BK** |
Matches all files in any directory starting with BK,
recursively. Expands to BK*/**. |
BK**.h |
Matches all *.h files in any directory starting
with BK, recursively. Expands to BK*/**/*.h. |
c:/Src*.h |
Matches all *.h files recursively, starting at c:/Src/. |
c:/Src*Grid/ |
Recursively matches all directories under c:/Src/ that end
with Grid. |
c:/Src*Grid*/ |
Recursively matches all directories under c:/Src/ that
contain Grid. |
c:/Src*Grid*ABCReadme.txt |
Recursively matches all directories under c:/Src/ that
contain Grid. From the found directory, recursively matches
directories until ABC/ is found. From there, the file Readme.txt is
searched for recursively. |
Finally, a couple flags are available. Flags are appended at the end of
the pattern line. Each flag begins with an @ character. Spaces
should not be inserted between flags unless they are intended as part of the
string literal.
Flags and Other Expansions |
Description |
@-pattern |
Adds pattern to the ignore list. Any file
matching a pattern in the ignore list is discounted from the search. |
@=pattern |
Adds pattern to the exclusive file list.
Any file not matching a pattern in the exclusive file list is automatically
removed from the search. |
More than two periods for going up parent directories. |
Similar to 4DOS, each period exceeding two periods goes up
one additional parent directory. So, a 4 period path expands to
../../../. |
And a few examples:
Example Pattern |
Description |
Src@-SCCS/@-BitKeeper/ |
Recursively lists all directories under Src/,
but directories called SCCS/ and BitKeeper/ are
filtered. |
Src |
Recursively lists all files under Src/ which match *.lua or
README. All other files are ignored. |
Src@-SCCS/@-BitKeeper/@=*.lua@=README |
Recursively lists all files under Src/ which match *.lua or
README. The versions of those files that may exist in SCCS/
or BitKeeper/ are ignored. |
Matching Files
The class FileGlobBase
is the base class for all glob operations. It is
not possible to instantiate FileGlobBase
. There is a single abstract
function which must be overridden called FoundMatch()
. Any time a match is
found, FoundMatch()
is called with the matched name.
Should we want to print the names to stdout
as they are received, we would
create a derived class like this:
class FileGlobPrintStdout : public FileGlobBase
{
virtual void FoundMatch( const char* name )
{
printf( "%s\n", name );
}
};
Next, we instantiate the object:
FileGlobPrintStdout fileGlob;
To begin the matching process, the function FileGlobBase::MatchPattern()
is
called with the requested pattern.
fileGlob.MatchPattern( "**" );
By the time MatchPattern()
exits, all files existing in the current directory
and below will have been passed to FileGlobPrintStdout::FoundMatch()
and printed
to stdout
.
Ignoring Files and Directories
Several source control systems add extra directories within the working copy.
CVS, for example, adds a directory called CVS/ to every directory in the working
copy. BitKeeper
adds a directory called BitKeeper/ to the root of the
working copy and directories called SCCS/ to every directory contained in the
working copy under source control.
The file globbing class provides an easy solution to this problem.
FileGlobBase::AddIgnorePattern()
may be called with a pattern (wildcarded or
not), and any file or directory matching the pattern is simply ignored.
This functionality directly corresponds to a file pattern's @-
flag, described above.
Directory ignore patterns are specified as Dir/
. The closing slash
must be present. Dir/
and Dir
are two different patterns, the first
referring to directories and the second to files.
To remove all the CVS directories from the recursive list, we call:
fileGlob.AddIgnorePattern( "CVS/" );
fileGlob.MatchPattern( "**" );
This approach works equally well with files. If the desired file list
should contain MP3 files and no WAV files, we insert the wildcard *.wav
.
fileGlob.AddIgnorePattern( "*.wav" );
fileGlob.MatchPattern( "**" );
Obviously, matching the pattern *.wav
while ignoring the pattern
*.wav
will result in no files being listed.
Forcing Only Certain Files
FileGlobBase
implements a function called
AddExclusivePattern()
. Providing exclusive patterns ensures your
application only receives files through FoundMatch()
which match
any exclusive patterns registered. This functionality directly corresponds
to the file pattern flag @=
, described above.
fileGlob.AddExclusivePattern( "*.lua" );
fileGlob.AddExclusivePattern( "*.c" );
fileGlob.MatchPattern( "**" );
When recursively matching the **
, only files matching *.lua
and *.c
are considered.
Class Details
The classes included in the archive are documented as per Doxygen
conventions. A fair amount of documentation exists in the source and
header files.
Class: FileGlobBase
FileGlobBase
is the base class of all file glob access. It is not
possible to instantiate a FileGlobBase
class. A derived class,
implementing FoundMatch()
, must be used.
Class: FileGlobList
FileGlobList
is derived from FileGlobBase
and mixes in a
std::list<
std::string >
container. FileGlobList
provides an implementation for
FoundMatch()
and stores the matched file list in the std::list<>
container.
All basic STL operations for std::list<>
may used directly on
FileGlobList
.
For each file sent to FoundMatch()
, the container is iterated using a case
insensitive search. If the file is already in the container, it is
ignored. Otherwise, it is appended at the end. In this manner, the
MatchPattern()
function may be called multiple times to accumulate a large list
of unique files. It should be noted that subsequent calls to MatchPattern()
may insert files out of sorted order into the container.
Example: Glob Application
A sample application showing off the globbing capabilities is included.
The sample Visual Studio .NET solution, Glob.sln, builds the executable
Glob.exe. Glob.exe is a
simplistic command-line interface to the file glob code.
Glob.exe may be run without arguments (or with -?
) to see its usage.
For applications supporting it, Glob.exe's output may be piped
into another application. If using BitKeeper and the desire is to edit all *.cpp files and
*.h files, the user would run:
glob -i SCCS/ -i BitKeeper/ **.cpp **.h | bk edit -
The -i
command line option is used to specify ignore patterns.
The contents of the SCCS/
or BitKeeper/
directories should not be considered.
Finally, exclusive patterns may be specified via command-line flag -e
.
These are similar to the @=
flag entries described above.
glob -e *.cpp -e *.h **
Wish List
- It would be fantastic to find a free implementation of the POSIX
fnmatch()
function and replace the WildMatch()
routine that is used now.
fnmatch()
provides greater "regular expression" style matching capabilities.
Known Bugs
There are a lot of combinations for matching. I've tried a great deal
of combinations, and those combinations seem to work. If you run into an
issue, let me know. It would not be surprising for cases to pop up that
don't work as desired. Don't hesitate to contact me at
jjensen@workspacewhiz.com with
issues.
Credits
- Perforce (http://www.perforce.com/), for giving me the idea in the first
place. For those who don't know, Perforce provides a free two user
license for their source control software. I currently use Perforce at
home for all my source control needs. It's not perfect, but it gets the
job done better than most.
- Matthias Wandel, for the
MyGlob()
code, which is the basis of the algorithm driving the file globber. The
original C implementation exists in myglob.c from his Exif Jpeg camera setting parser and thumbnail remover application
at http://www.sentex.net/~mwandel/jhead/.
- Jack Handy, for his article at
http://www.codeproject.com/string/wildcmp.asp.
wildcmp()
was expanded into WildMatch()
, allowing case sensitive and
insensitive comparisons.