Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Data Archive

0.00/5 (No votes)
30 Jun 2004 2  
This article describes creating indexed search systems using relational databases

Introduction

Before a time ago my CD collection was a little, but it quickly grew and became unmanaged. I have been spending much time to find some document. Therefore I decided to index the CD collection. First I tried several existing programs for organizing CD catalogs, but they haven't good search functionality and intuitive user interface. That is why I decided to create my own search program. This article describes an approach of creating indexing search systems.

Background

I recommend you to preview MS SQL Server 2000 Books Online for "FullText search overview" and "CONTAINSTABLE" articles.

Project overview

The application functionality can be divided on two groups: scanning the data source and document searching. The first one can be easily implemented using recursive directories' browsing. The second one requires some indexing mechanism, i.e. the search data should be indexed. It is possible to create an indexing mechanism manually [2], but I have decided use more less way, using integrated Microsoft SQL Server Full Text Search functionality.

Using the code

Classes

RegistryHelper The RegistryHelper class manages a Windows Registry key(s)/value(s) and provides a convenient interface to set/get application settings.
SqlHelper Implements the data access methods.
FullTextQuery The FullTextQuery class transforms a user search query to Full-Text query.
DirectoryScaner Scans the file system hierarchy and gets directories and files descriptions.
ScanForm Displays the scan form.
SqlConnStringForm Parses and forms the Microsoft SQL Server connection string.
SearchForm Provides the GUI for searching archive documents.
MainForm Displays a main application form.
Stored procedures

[sp_BuildIndexes] Starts incremental indexes build.
[sp_CreateDirectory] Creates a directory record.
[sp_CreateFile] Creates the file record.
[sp_CreateRootDirectory] Creates a directory record.
[sp_DeleteDirectory] Deletes the directory record.
[sp_DeleteFile] Deletes the file record.
[sp_GetDirectory] Gets the directory information.
[sp_GetDirectoryFiles] Gets all files of specified directory.
[sp_GetDirectoryPath] Get all sub-directories of the directory.
[sp_GetFile] Gets the file information.
[sp_GetFilePath] Get the full path to the file.
[sp_GetRootDirectories] Gets all root directories.
[sp_GetSubDirectories] Get all sub-directories of the directory.
[sp_Search] Search for a file or directory.
[sp_UpdateDirectory] Updates the directory record.
[sp_UpdateFile] Updates the file record.
User defined functions

[fn_GetDirectoryPath] Constructs and gets the full directory path.
[fn_GetFilePath] Constructs and gets the full file path.

[sp_Search]

-- Search for a file or directory.

CREATE PROCEDURE [dbo].[sp_Search]
  @Text       NVARCHAR(256),
  @Sort       INT  = 1, -- Rank

  @Created     DATETIME = NULL,
  @Modified     DATETIME = NULL
AS
-- 

DECLARE @DirectoryCategory  INT
SET @DirectoryCategory = 1

DECLARE @FileCategory  INT
SET @FileCategory = 2
--

DECLARE @results TABLE 
(
  CategoryID    INT NOT NULL,
  ResultID     INT NOT NULL,
  Result       NVARCHAR(260) NOT NULL,
  Path      NVARCHAR(2048) NOT NULL,
  Rank      INT NOT NULL DEFAULT 0
)
-- Search in directories.

INSERT
  @results
SELECT 
  @DirectoryCategory    AS CategoryID,
  DataTbl.DirectoryID    AS ResultID,  
  DataTbl.Name       AS Result,
  [dbo].[fn_GetDirectoryPath] (DataTbl.DirectoryID)  AS Path,
  KeyTbl.[RANK]      AS Rank
FROM
  [dbo].[Directories] AS DataTbl
JOIN
  CONTAINSTABLE([dbo].[Directories], *, @Text) AS KeyTbl
ON
  DataTbl.DirectoryID = KeyTbl.[KEY]

-- Search in files.

INSERT
  @results
SELECT 
  @FileCategory      AS CategoryID,
  DataTbl.FileID      AS ResultID,  
  DataTbl.Name       AS Result,
  [dbo].[fn_GetFilePath] (DataTbl.FileID)  AS Path,
  KeyTbl.[RANK]      AS Rank
FROM
  [dbo].[Files] AS DataTbl
JOIN
  CONTAINSTABLE([dbo].[Files], *, @Text) AS KeyTbl
ON
  DataTbl.FileID = KeyTbl.[KEY]

IF (@Sort = 1) -- Relevance

  BEGIN
    SELECT 
      CategoryID, ResultID, Result, Path, Rank
    FROM
      @results
    ORDER BY
      Rank DESC
  END
ELSE IF (@Sort = 2) -- Category

  BEGIN
    SELECT 
      CategoryID, ResultID, Result, Path, Rank
    FROM
      @results
    ORDER BY
      CategoryID ASC
  END
ELSE IF (@Sort = 3) -- Name

  BEGIN
    SELECT 
      CategoryID, ResultID, Result, Path, Rank
    FROM
      @results
    ORDER BY
      Result ASC
  END
ELSE -- undefined sort parameter - does not sort results

  BEGIN
    SELECT 
      CategoryID, ResultID, Result, Path, Rank
    FROM
      @results
  END
GO

I think that this stored procedure should be explained more detailed. The [dbo].[sp_Search] stored procedure performs full-text search in directories ([dbo].[Directories]) and files ([dbo].[Files]) and combines found results. The found results could be sorted by relevance, category and name as well. A found result consists of CategoryID, ResultID, Result, Path and Rank values. The CategoryID is used to determine type of found result: directory or file. The ResultID represents the founded result identifier. The Result is exactly the founded string. The Path is the recursively built the founded resource path.

Note: The @Text input parameter data should be in full-text search format. I created a special class FullTextQuery , which transforms a user query to full-text search query.

How to use the application

Download and install the project. Download and execute the DataArchive.sql script. Start the application from the Start menu or Desktop shortcut. If the application could not connect to the database, it will prompt the database connection dialog. Configure the database connection.

History

  • [07.05.2004] - Data Archive 1.0

Reference

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here