Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C++

Searching on Text Files

1.43/5 (8 votes)
15 Jul 2008CPOL2 min read 1   807  
This program is to search for words on text files.

Introduction

This is one of my projects that has a program to search on a text file. Assume that you have a set of text files stored somewhere in the hard disk. You want to find some text files, but you don't remember the file name. However, you know the content that you're looking for so that you have some keywords to search for. This is like the search function of Windows.

Background

Some of the requirements are:

  1. Create the FileList: Create a text file named FileList to store all of the text file paths. Each line of this file is a file path. Every line has an ID to identify the file path. The ID number starts at 0.
  2. Indexing: Scan all text files and store each word into a Binary Search Tree for searching quickly. Every node in the tree contains a word, a list of ID numbers, and left and right pointers.
  3. Display: Only output a little portion of the text files that contain the keywords and the ID to know which file was searched.

Using the Code

To create the FileList, I use the CStdioFile class:

C++
// Create file
CStdioFile file;
file.Open("FileList.txt",CFile::modeCreate|CFile::modeReadWrite);

CFileFind Finder; // Find file path

BOOL bWorking = Finder.FindFile(m_PATH + "\\*.txt"); // Only file text files
while(bWorking)
{
    bWorking = Finder.FindNextFile();
    if (!Finder.IsDirectory())
    {
        file.WriteString(Finder.GetFilePath()); // Write file path
        file.WriteString("\n");
    }
}
file.Close();

For searching, I use a Binary Search Tree to store the words. Firstly, I scan the directory stores text files to create FileList. Then, open every text file in FileList to scan for words. Every word is stored in the BST. A word can have many IDs, so I use a Linear Linked List to store the ID numbers.

C++
// Search word
ListID* CTinyGoogleDlg::SearchWord(string key)
{
    tree* current;
    ListID *tmp = NULL;
    
    // Find word
    if (head)
    {
        current = head;
        while (current)
        {
            if (strcmp(current->word,key) == 0)
                break;
            else
                if (strcmp(current->word,key) < 0)
                    current = current->right;
                else
                    if (strcmp(current->word,key) > 0)
                        current = current->left;    
        }
    }
    else
        MessageBox("Something's wrong!");
    
    // Return list of IDs
    if (!current)
        return tmp;
    else
        return current->IDs;
}

Then, ask the user to input keywords to search. Search on the Binary Tree to find whether the keywords exist or not. If yes, use the ID to open the text file. Then, print out some lines of the text file in the result.

C#
// Display results
int CTinyGoogleDlg::Display(ListID *curr)
{
    CStdioFile file;
    CString sText;
    
    m_RESULT = "";
    if (curr)
    {
        if (file.Open("FileList.TXT",CFile::modeRead))
        {
            int count = -1;
            while (curr)
            {
                // Find file path to open text file by checking ID
                CString path;
                do
                {
                    file.ReadString(path);
                    count += 1;
                }while(count < curr->ID);
                
                CString DocID;
                DocID.Format("%d",curr->ID);
                m_RESULT = m_RESULT + "\r\nDocID:" + DocID + "\r\n";
                
                // Open file and display a part of paragraph
                CStdioFile read;
                read.Open(path,CFile::modeRead);
                for (short nLineCount = 0; nLineCount < 16; nLineCount++)
                {
                    read.ReadString(sText);
                    m_RESULT = m_RESULT + sText + "\r\n";
                }
                
                // Set lines in edit
                GetDlgItem(IDC_EDIT_RESULT)->SetWindowText(m_RESULT);
                read.Close();
                curr = curr->next;
            }
        }
        file.Close();
    }
    else
        MessageBox("NOT FOUND!");
    return 0;
}

Points of Interest

In the beginning, I met with some trouble on how to find the file paths. This wasn't very difficult, but at my level, it's not very easy. However, I found some ways on the Internet, and CodeProject helped me very much. Now, I am sharing my little program with others.

History

The first version of this program was written as a Win32 console app. This version is an MFC app.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)