Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / programming / algorithm

Iterative Implementation of Recursively Enumerating Files and Sub Folders

4.64/5 (19 votes)
30 Dec 2010CPOL3 min read 61.1K   3.6K  
Yet another implementation to enumerate files

QOF_SnapShot.gif

The demo application, QuickOpenFiles.

Introduction

Microsoft provided a sample code for Listing Files in a Directory, but that example code cannot be used to list files in the sub-directory, so how can we do that?

Enumerating/searching files and sub folders is in fact a rather basic skill, people have posted many articles on that, you can see the other examples here. However, many of those implementations enumerate files in a recursive way. As a beginner in programming, I just wonder how to do that non-recursively.

Recursive Implementation

In many cases, recursive implementation can be really elegant, if you already know how to do this recursively, then you may just simply skip this part.

To search files in Windows, we can use the APIs FindFirstFile/FindNextFile (or SHGetDataFromIDList, or the FileSystem library in boost), and a typical implementation is to do this recursively:

C++
#include "strsafe.h"

bool EnumerateFolder(LPCTSTR lpcszFolder, int nLevel = 0)
{
    WIN32_FIND_DATA ffd;
    TCHAR szDir[MAX_PATH];
    HANDLE hFind = INVALID_HANDLE_VALUE;

    StringCchCopy(szDir, MAX_PATH, lpcszFolder);
    StringCchCat(szDir, MAX_PATH, TEXT("\\*"));
    
    // Find the first file in the directory.
    
    hFind = FindFirstFile(szDir, &ffd);
    
    if (INVALID_HANDLE_VALUE == hFind) 
    {
        return false;
    } 
    
    // List all the files in the directory with some info about them.

    TCHAR szOutLine[MAX_PATH] = {0};
    for (int ii = 0; ii < nLevel; ++ii)
        szOutLine[ii] = _T('\t');
    
    do
    {
        if (ffd.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)
        {
            if ( _T('.') != ffd.cFileName[0] )
            {
                _tprintf(TEXT("%s%s   <DIR>\n"), 
                              szOutLine, ffd.cFileName);
                StringCchCopy(szDir+_tcslen(lpcszFolder)+1, 
                              MAX_PATH, ffd.cFileName);

                EnumerateFolder(szDir, nLevel+1);    // Here's the recursive call!
            }
        }
        else
        {
            _tprintf(TEXT("%s%s\n"), szOutLine, ffd.cFileName);
        }
    }
    while (FindNextFile(hFind, &ffd) != 0);

    FindClose(hFind);
    return true;
}

The Issues of Recursion

The readability of recursive implementation is usually good, but I started to think about the potential Stack Overflow problem.

In Windows, each thread is granted a default stack size of 1 MB, and that might be big enough in most cases.

In the above function EnumerateFolder(), it will take around 580 bytes for the local variables (ignore the szOutLine). That means, in theory, we can enumerate up to 1808 levels of subdirectories.

However, since the buffer size (MAX_PATH) of the path is limited to 260 characters, it is not possible to run into that amount of levels.

In fact, Microsoft does provide the mechanism to break that limitation.

In the ANSI version of this function, the name is limited to MAX_PATH characters. To extend this limit to 32,767 widecharacters, call the Unicode version of the function and prepend "\\?\" to the path.

Although some people said that 260 characters is good enough, and the stack might not that easily overflow, I guess avoiding recursion when possible can't be a bad idea; am I correct? This implementation is more like a practice for beginners like me, so let's just have fun coding it.

Iterative Implementation

In general, there are two ways to do a searching iteratively:

  1. Depth-first searching (DFS)
  2. Breadth-first searching (BFS)

Since Raymond Chen explained that breadth-first searching is better for file system tree walking, I decided to post the BFS implementation snippet here.

For those who still want to know how to implement this using DFS, you can dig into the demo code in this page and open the source file FileEnumerator.h, then you will see that I have prepared some #ifdef switch there, you can simply try comment out/uncomment this macros:

C++
//#define FILEENUMERATOR_RECURSION
//#define FILEENUMERATOR_DOCOUNTING

#ifndef FILEENUMERATOR_RECURSION
	#define FILEENUMERATOR_BFS
#endif

Now let's write a BFS implementation. First, we need a class that encapsulates the basic operations:

C++
class CFileFinder
{
public:
	CFileFinder(LPCTSTR lpcszInitDir, FindFileData& fileInfo)
		: m_fileInfo(fileInfo)
	{
		Init(lpcszInitDir);
	}

public:
	inline bool FindFirst()
	{
		return EnumCurDirFirst();
	}

	inline bool FindCurDirNext()
	{
		bool bRet = ::FindNextFile(m_hFind, &m_fileInfo) != FALSE;
		if ( bRet )
		{
			m_szPathBuffer.resize(m_nFolderLen);
			m_szPathBuffer += m_fileInfo.cFileName;
		}
		else
		{
			::FindClose(m_hFind);
			m_hFind = INVALID_HANDLE_VALUE;
		}
		return bRet;
	}

	virtual bool Finish() const					
		{ return INVALID_HANDLE_VALUE == m_hFind; }

	inline LPCTSTR GetPath() const					
		{return STRBUFFER(m_szPathBuffer) + EXTEND_FILE_PATH_PREFIX_LEN;}

	inline const FindFileData& GetFileInfo() const	{ return m_fileInfo; }

	inline bool IsDirectory() const				
		{ return (m_fileInfo.dwFileAttributes & 
			FILE_ATTRIBUTE_DIRECTORY) == FILE_ATTRIBUTE_DIRECTORY; }
	inline bool IsDot() const						
		{ return (m_fileInfo.cFileName[0] == '.') && 
			((m_fileInfo.cFileName[1] == '.') || 
			(m_fileInfo.cFileName[1] == '\0')); }
protected:
	virtual bool EnumCurDirFirst()
	{
		m_szPathBuffer.resize(m_nFolderLen+2);
		m_szPathBuffer[m_nFolderLen++]	= _T('\\');
		m_szPathBuffer[m_nFolderLen]	= _T('*');

		HANDLE hFind = ::FindFirstFile(STRBUFFER(m_szPathBuffer), &m_fileInfo);
		bool bRet = hFind != INVALID_HANDLE_VALUE;
		if ( bRet )
		{
			m_hFind			= hFind;
			m_szPathBuffer.resize(m_nFolderLen);
			m_szPathBuffer += m_fileInfo.cFileName;
		}
		return bRet;
	}

	void Init(LPCTSTR lpcszInitDir)
	{
		m_nFolderLen = _tcslen(lpcszInitDir);
		m_szPathBuffer = lpcszInitDir;
		
		if ( m_szPathBuffer[m_nFolderLen-1] == _T('\\') )
		{
			m_szPathBuffer.erase(m_nFolderLen-1);
			--m_nFolderLen;
		}
	}

protected:
	FindFileData&	m_fileInfo;
	tstring			m_szPathBuffer;
	UINT			m_nFolderLen;
	HANDLE			m_hFind;
};

Then we need to put the pending sub-directories into a queue:

C++
#include <boost/shared_ptr.hpp>
typedef boost::shared_ptr<CFileFinder>	FileFindPtr;

typedef std::queue<FileFindPtr>				FileFindQueue;

bool CFileEnumeratorBase::EnumerateBFS
    ( LPCTSTR lpcszInitDir, FindFileData& findFileData, HANDLE hStopEvent /*= NULL*/ )
{
	// Breadth-first searching, BFS:
	FileFindPtr finder = NULL;
	try
    {
		finder = new CFileFinder(lpcszInitDir, findFileData);
	}
	catch (bad_alloc&)
	{
		CFE_ASSERT(0);
		return false;
	}

	bool bRet = true;
	FileFindQueue finderQueue;
	
    if ( !finder->FindFirst() )
	{
		m_dwLastError = ::GetLastError();
		OnError(finder->GetPath());
		return false;
	}
	else
	{
		while( !finder->Finish() && !IsStopEventSignaled() )
        {
			const FindFileData& fileInfo = finder->GetFileInfo();
			
			if( finder->IsDirectory() )
			{
				if ( !finder->IsDot() )
				{
					if ( CheckUseDir
						(finder->GetPath(), fileInfo) )
					{
						HandleDir(finder->GetPath(), 
							fileInfo);
						
						FileFindPtr newFinder = NULL;
						try
						{
							newFinder = 
							    new CFileFinder
							    (finder->GetPath(), 
							    findFileData);
							finderQueue.push
							    (newFinder);
						}
						catch (bad_alloc&)
						{
							CFE_ASSERT(0);
						}
					}
				}
			}
			else
			{
				if ( CheckUseFile(finder->GetPath(), fileInfo) )
				{
					HandleFile(finder->GetPath(), fileInfo);
				}
			}
            if ( !finder->FindCurDirNext() )
			{
				FinishedDir( finder->GetPath() );
				if ( finderQueue.empty() )
					break;
				else
				{
					while ( !IsStopEventSignaled() )
					{
						FileFindPtr nextFinder = 
							finderQueue.front();
						finderQueue.pop();

						finder = nextFinder;

						if ( !finder->FindFirst() )
						{
							m_dwLastError = 
								::GetLastError();
							if ( !OnError
							    (finder->GetPath()) )
							{
								return false;
							}
						}
						else
							break;
					}
				}
			}
        }
    }
    return bRet;
}

Using the Code

The initial design of classes was inspired by Andreas Saurwein Franci Gonçalves' article, please read his article first! Then you need to implement your own class derived from either CFileEnumeratorBase or CFilteredFileEnumerator.

CFileEnumeratorBase is provided for basic file enumeration, and CFilteredFileEnumerator is provided for pattern filtering enumeration.

Following is the example class derived from CFilteredFileEnumerator:

C++
class CFileInfoEnumerator : public CFilteredFileEnumerator
{
public:
    CFileInfoEnumerator();
    virtual ~CFileInfoEnumerator();
public:
    enum FileInfoType
    {
        FIT_INVALID        = -1,
        FIT_FIRST,
        FIT_FILENAME    = FIT_FIRST,
        FIT_FILEPATH,
        FIT_FILEEXT,
        FIT_MODIFIED,
        
        FIT_SIZE,
    };

    struct FileInfo
    {
        std::string sFileInfo[FIT_SIZE];
    };
public:
    int                GetFileCount() const    { return m_arrFileInfo.size(); }

    const FileInfo*    GetFileInfo(int nIndex);
protected:
    virtual void    HandleFile(LPCTSTR lpcszPath, const FindFileData& ffd);
protected:
    typedef std::vector<fileinfo>    FileInfoArray;
    FileInfoArray    m_arrFileInfo;
};

#define DEFAULT_SORT_ASCENDING true

CFileInfoEnumerator::CFileInfoEnumerator()
    : m_nSortInfoType(FIT_INVALID)
    , m_bSortAscending(DEFAULT_SORT_ASCENDING)
{
    
}

CFileInfoEnumerator::~CFileInfoEnumerator()
{
    
}

void CFileInfoEnumerator::HandleFile( LPCTSTR lpcszPath, const FindFileData& ffd )
{
    FileInfo fileInfo;
    fileInfo.sFileInfo[FIT_FILENAME] = ffd.cFileName;

    fileInfo.sFileInfo[FIT_FILEPATH] = lpcszPath;

    std::string::size_type nDotPos = fileInfo.sFileInfo[FIT_FILENAME].rfind(_T('.'));
    if ( nDotPos >= 0 )
        fileInfo.sFileInfo[FIT_FILEEXT] = 
                 fileInfo.sFileInfo[FIT_FILENAME].c_str()+nDotPos+1;

    SYSTEMTIME st;
#define FTIME_BUFFER_SIZE    255

    TCHAR szModified[FTIME_BUFFER_SIZE+FTIME_BUFFER_SIZE], 
          szLocalDate[FTIME_BUFFER_SIZE], szLocalTime[FTIME_BUFFER_SIZE];
    FILETIME ft = ffd.ftLastWriteTime;
    
    FileTimeToLocalFileTime( &ft, &ft );
    FileTimeToSystemTime( &ft, &st );
    GetDateFormat( LOCALE_USER_DEFAULT, DATE_SHORTDATE, 
                   &st, NULL, szLocalDate, FTIME_BUFFER_SIZE );
    GetTimeFormat( LOCALE_USER_DEFAULT, TIME_NOSECONDS, 
                   &st, NULL, szLocalTime, FTIME_BUFFER_SIZE );

    _sntprintf(szModified, FTIME_BUFFER_SIZE+FTIME_BUFFER_SIZE, 
               _T("%s %s"), szLocalDate, szLocalTime);
    fileInfo.sFileInfo[FIT_MODIFIED] = szModified;

    m_arrFileInfo.push_back(fileInfo);
}

const CFileInfoEnumerator::FileInfo* CFileInfoEnumerator::GetFileInfo( int nIndex )
{
    if ( 0 <= nIndex && nIndex < (int)m_arrFileInfo.size() )
    {
        return &m_arrFileInfo.at(nIndex);
    }
    return NULL;
}

And to use it:

C++
CFileInfoEnumerator enumerator;
TCHAR chDir[MAX_PATH];
GetCurrentDirectory(MAX_PATH, chDir);

enumerator.SetFilterPatterns(_T("*.exe;*.dll;*.h;*.c;*.cpp;"));
enumerator.Enumerate(chDir);

About the Demo

I wrote a console application to demonstrate the use of the source code.

Other Projects in CodeProject that Inspired Me

  1. CEnum - File Enumeration and File Globbing Class
  2. File and Directory Enumeration
  3. XEditPrompt - CEdit-derived control with web-like prompt

History

  • 2010-7-10: Initial release
  • 2010-11-25: Provided the DFS implementation
  • 2010-12-29: Skip the junction (reparse point) to avoid infinite loop

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)