Introduction
Before a time ago my CD collection was a little, but it quickly grew and
became unmanaged. I have been spending much time to find some document.
Therefore I decided to index the CD collection. First I tried several existing
programs for organizing CD catalogs, but they haven't good search functionality
and intuitive user interface. That is why I decided to create my own search
program. This article describes an approach of creating indexing search
systems.
Background
I recommend you to preview MS SQL Server 2000 Books Online for "FullText search overview"
and "CONTAINSTABLE" articles.
Project overview
The application functionality can be divided on two groups: scanning the data
source and document searching. The first one can be easily implemented using
recursive directories' browsing. The second one requires some indexing
mechanism, i.e. the search data should be indexed. It is possible to create an
indexing mechanism manually [2], but I have decided use more less way, using
integrated Microsoft SQL Server Full Text Search functionality.
Using the code
Classes
RegistryHelper |
The RegistryHelper class manages a Windows Registry key(s)/value(s)
and provides a convenient interface to set/get application settings. |
SqlHelper |
Implements the data access methods. |
FullTextQuery |
The FullTextQuery class transforms a user search query to Full-Text
query. |
DirectoryScaner |
Scans the file system hierarchy and gets directories and files
descriptions. |
ScanForm |
Displays the scan form. |
SqlConnStringForm |
Parses and forms the Microsoft SQL Server connection string. |
SearchForm |
Provides the GUI for searching archive documents. |
MainForm |
Displays a main application form. |
Stored procedures
[sp_BuildIndexes] |
Starts incremental indexes build. |
[sp_CreateDirectory] |
Creates a directory record. |
[sp_CreateFile] |
Creates the file record. |
[sp_CreateRootDirectory] |
Creates a directory record. |
[sp_DeleteDirectory] |
Deletes the directory record. |
[sp_DeleteFile] |
Deletes the file record. |
[sp_GetDirectory] |
Gets the directory information. |
[sp_GetDirectoryFiles] |
Gets all files of specified directory. |
[sp_GetDirectoryPath] |
Get all sub-directories of the directory. |
[sp_GetFile] |
Gets the file information. |
[sp_GetFilePath] |
Get the full path to the file. |
[sp_GetRootDirectories] |
Gets all root directories. |
[sp_GetSubDirectories] |
Get all sub-directories of the directory. |
[sp_Search] |
Search for a file or directory. |
[sp_UpdateDirectory] |
Updates the directory record. |
[sp_UpdateFile] |
Updates the file record. |
User defined functions
[fn_GetDirectoryPath] |
Constructs and gets the full directory path. |
[fn_GetFilePath] |
Constructs and gets the full file path. |
[sp_Search]
CREATE PROCEDURE [dbo].[sp_Search]
@Text NVARCHAR(256),
@Sort INT = 1,
@Created DATETIME = NULL,
@Modified DATETIME = NULL
AS
DECLARE @DirectoryCategory INT
SET @DirectoryCategory = 1
DECLARE @FileCategory INT
SET @FileCategory = 2
DECLARE @results TABLE
(
CategoryID INT NOT NULL,
ResultID INT NOT NULL,
Result NVARCHAR(260) NOT NULL,
Path NVARCHAR(2048) NOT NULL,
Rank INT NOT NULL DEFAULT 0
)
INSERT
@results
SELECT
@DirectoryCategory AS CategoryID,
DataTbl.DirectoryID AS ResultID,
DataTbl.Name AS Result,
[dbo].[fn_GetDirectoryPath] (DataTbl.DirectoryID) AS Path,
KeyTbl.[RANK] AS Rank
FROM
[dbo].[Directories] AS DataTbl
JOIN
CONTAINSTABLE([dbo].[Directories], *, @Text) AS KeyTbl
ON
DataTbl.DirectoryID = KeyTbl.[KEY]
INSERT
@results
SELECT
@FileCategory AS CategoryID,
DataTbl.FileID AS ResultID,
DataTbl.Name AS Result,
[dbo].[fn_GetFilePath] (DataTbl.FileID) AS Path,
KeyTbl.[RANK] AS Rank
FROM
[dbo].[Files] AS DataTbl
JOIN
CONTAINSTABLE([dbo].[Files], *, @Text) AS KeyTbl
ON
DataTbl.FileID = KeyTbl.[KEY]
IF (@Sort = 1)
BEGIN
SELECT
CategoryID, ResultID, Result, Path, Rank
FROM
@results
ORDER BY
Rank DESC
END
ELSE IF (@Sort = 2)
BEGIN
SELECT
CategoryID, ResultID, Result, Path, Rank
FROM
@results
ORDER BY
CategoryID ASC
END
ELSE IF (@Sort = 3)
BEGIN
SELECT
CategoryID, ResultID, Result, Path, Rank
FROM
@results
ORDER BY
Result ASC
END
ELSE
BEGIN
SELECT
CategoryID, ResultID, Result, Path, Rank
FROM
@results
END
GO
I think that this stored procedure should be explained more detailed. The [dbo].[sp_Search]
stored procedure performs full-text search in directories ([dbo].[Directories]
)
and files ([dbo].[Files]
) and combines found results. The found
results could be sorted by relevance, category and name as well. A found result
consists of CategoryID
, ResultID
, Result
, Path
and
Rank
values. The CategoryID
is used to determine type of found result: directory or file. The ResultID
represents the
founded result identifier. The Result
is exactly the founded string. The Path
is the recursively built the founded resource path.
Note: The @Text input parameter data should be in full-text search
format. I created a special class FullTextQuery
, which transforms
a user query to full-text search query.
How to use the application
Download and install the project. Download and execute the DataArchive.sql script. Start the
application from the Start menu or Desktop shortcut. If the application could not connect to the database, it
will prompt the database connection dialog. Configure the database connection.
History
- [07.05.2004] - Data Archive 1.0
Reference