Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / desktop / MFC

A permuted index (keyword-in-context) generator

4.96/5 (16 votes)
3 Oct 2006CPOL5 min read 2   1.3K  
Generating and displaying a permuted index (keyword-in-context index) from text entries.

PermutedIndex screen shot

Introduction

A permuted index, also called a keyword-in-context index, is an alphabetical list of keywords displayed with their surrounding text (context). The screenshot above shows an example. The display format makes it easy to scan for keywords of interest, the context helps to identify the particular instance when a keyword appears multiple times, and the keywords themselves can serve as links for direct access to the text or objects to which they refer.

Background

The most well-known keyword index is the alphabetical index that appears in the back of a book, which lists keywords along with the numbers of the pages on which the keywords appear. The reader using the index must examine the listed pages until he finds the particular reference he seeks. The keyword-in-context index makes it easy to find the desired reference by including the surrounding text as well as the keywords. For readability, it is formatted so that the keywords align in a vertical column. This format may be familiar from its use in printed documentation for the Unix operating system.

This indexing approach is especially applicable to a group of entities that have one-line descriptions: the description line itself serves as the context. Descriptions of functions in a library, descriptions of photos in a collection, short quotations, or proverbs are all suitable examples.

Using the code

The code contains the indexer itself, CPermutedIndex, and a dialog based, App-Wizard generated, sample application that accepts user-entered strings, builds an index from those strings, and displays the index in a list box and an alphabetical "thumb-index" window. The sample application can also write the index to a file as HTML, or as C source code for use in another project.

Building the index

To build the index, construct a CPermutedIndex object, and call its BuildIndex method with an array of the strings to be indexed.

m_pPermutedIndex = new CPermutedIndex;
m_pPermutedIndex->BuildIndex(m_pInputLinePtrs, m_nInputLineCount);

Note that the index object does not copy the strings, so the array of strings must persist as long as the index is needed.

Excluding words from the index

Some words add only clutter to the index. Articles and connectives such as the, and, of. Context-specific filler words such as avenue, street, gram, kilometer, volt, second. To exclude your set of words, call SetExcludedWordList with an array of strings before building the index.

static char szExcludedWords[] =
{
    "the", "and", "of", "avenue", "street", "gram", 
    "kilometer", "volt", "second"
};
...
    m_pPermutedIndex = new CPermutedIndex;
    m_pPermutedIndex->SetExcludedWordList(szExcludedWords, 
                      sizeof(szExcludedWords)/sizeof(char *));
    m_pPermutedIndex->BuildIndex(m_pInputLinePtrs, m_nInputLineCount);

Using the index

To use the index with a Windows list box, create the list box with the LBS_OWNERDRAWFIXED style. Derive a class from CListBox, and override its DrawItem method to call DrawListBoxItem in your CPermutedIndex object. Then, call FillIndexListBox in your CPermutedIndex object after building the index.

// in the header file
class CIndexListBox : public CListBox
{
public:
    // set this to point to the dialog that hosts this control
    CPermutedIndexDemoDlg *m_pParentDlg;
protected:
    afx_msg void DrawItem(LPDRAWITEMSTRUCT lpDrawItemStruct);
};

// and in the implementation file
void CIndexListBox::DrawItem(DRAWITEMSTRUCT *pDiS)
{
    m_pParentDlg->m_pPermutedIndex->DrawListBoxItem(pDiS);
}

To respond to the user selecting an entry in the list box, add an event handler for ON_LBN_SELCHANGE, and call GetIndexTableEntry in your CPermutedIndex object.

void CPermutedIndexDemoDlg::OnSelchangeIndexList()
{
    int nCurSel = m_wndIndexList.GetCurSel();
    int nEntry = m_wndIndexList.GetItemData(nCurSel);
    const IndexEntry *pEntry = m_pPermutedIndex->GetIndexTableEntry(nEntry);
    int nSourceLine = pEntry->nItemIndex;
}

At this point, you need to know what an IndexEntry struct looks like:

typedef struct _IndexEntry
{
    int nItemIndex;       // the index of the string in the source list
    const char *pszText;  // a pointer to the start of the string
    short nKeywordOffset; // the offset of the keyword in the source string
    short nKeywordLength; // the length of the keyword in the source string
} IndexEntry;

from which all the information may be extracted to display the index and to relate an entry to its source string.

Thumb-index tab window

The "thumb-index tab" window displays the letters of the alphabet. Clicking on a letter returns the number of the first index entry that begins with that letter; the application can then scroll the list box to that entry by setting the list box selection to that entry.

To implement the thumb-index tab window, place a white rectangle static control on your dialog, and give it the SS_NOTIFY style. Derive a class from CStatic, and attach an instance of this class to the control. Implement the derived class as follows:

// in the header file
class CIndexTabWin : public CStatic
{
public:
    // set this to point to the dialog that hosts this control
    CPermutedIndexDemoDlg *m_pParentDlg;
protected:
    afx_msg void OnPaint();
    afx_msg void OnLButtonDown(UINT nType, CPoint point);
    DECLARE_MESSAGE_MAP()
};

// and in the implementation file
BEGIN_MESSAGE_MAP(CIndexTabWin, CStatic)
    ON_WM_PAINT()
    ON_WM_LBUTTONDOWN()
END_MESSAGE_MAP()

// let the code in CPermutedIndex do the drawing
//
void CIndexTabWin::OnPaint()
{
    PAINTSTRUCT ps;

    CDC *pDC = BeginPaint(&ps);
    m_pParentDlg->m_pPermutedIndex->DrawIndexTabWindow(this, pDC);
    EndPaint(&ps);
}

// forward the mouse click in the index tab window to
// the code in the parent dialog
//
void CIndexTabWin::OnLButtonDown(UINT nType, CPoint point)
{
    m_pParentDlg->ProcessTabWindowClick(point.x);
}

To link the tab window to the list box, you'll need code like this:

void CPermutedIndexDemoDlg::ProcessTabWindowClick(int nXPos)
{
    // get the entry number for the start of that letter's index
    // and set the list box selection to that one
    int nEntry = m_pPermutedIndex->IndexOffsetFromTabXCoord(nXPos);
    if (nEntry >= 0)        // "empty" tabs return -1
        m_wndIndexList.SetCurSel(nEntry);
}

Exporting the index for other uses

The information needed to draw the index and the tab window and to link them appropriately is all available from the CPermutedIndex object. The demo application contains code to export the index as linked HTML or as C-language arrays and structs. Examining this application should provide insight into how to use the index.

Points of interest

This indexer was originally written as part of a viewer for a collection of historical photographs, each described by a line of text. That application could also export the index as HTML with links to the image files. Experience with the index suggested that the indexing capability might be useful elsewhere.

Displaying the permuted index in a Windows list box gives an education in the use of owner-drawn list boxes. This implementation draws the text up to the keyword, using the TA_RIGHT text alignment to the horizontal center of the drawing rectangle, then draws the keyword and any text to its right using the TA_LEFT alignment at the center. The keyword is drawn in a different color and, optionally, in a different font; the demo application uses a slightly bold font to make the keyword stand out.

Displaying the index in HTML uses a similar approach. The index is created as a two-column table with invisible borders. The left column shows text up to the keyword and is right-justified; the right column shows the keyword and its following text, left-justified. The keyword is given a link to the appropriate target. The result appears as a single line of text, with the keywords aligned vertically.

The HTML thumb-index tabs are created as a series of links (for active letters) or strings (for inactive letters). Active entries link to the corresponding entry in the index, so those targets are created when the index is written, using the GetEntrySectionNumber method to get the tag, if any, associated with each index entry.

History

  • September 15, 2006: created.
  • September 20, 2006: fixed to compile with VC7.
  • October 2, 2006: added text description of SetExcludedWordList.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)