Introduction
Microsoft Indexing Service and OLEDB when put together is really something. The Microsoft Indexing Service provides property filtering and searching as well as full-text indexing and searching of file data.
Background
The Indexing Service provides filters for several popular file formats including Microsoft Word, Microsoft PowerPoint, Microsoft Excel, and HTML. Filters are also available for plain-text. Filters can be written by customers and third-party vendors for other formats as well.
One purpose of a filter is to provide support for non-plain-text documents. The other purpose is to capture property values both from the file content and about the files. Assuming that each file is a document, examples of properties include each document's title, the number of pages with notes in each PowerPoint document, the number of paragraphs in each document, the last date and time each file was accessed, and the physical path to each file.
Full-text indexes for file system searches are created by scanning the content of files. The process consists of keeping track of the significant words that are used and where they are located. For example, a full-text index may indicate that the word Canada is found at word number 227, word 473, and word number 1017 in a given file. This index structure supports an efficient search for all items containing indexed words, and advanced search operations such as phrase searches and proximity searches.
Using the code
The Indexing Service stores indexes and property values in a text-search catalog. By default, a text-search catalog named Web is created when Indexing Service is installed. A given text-search catalog references one or more IIS virtual directories (also known as virtual roots). A virtual directory references one or more physical directories and, optionally, other virtual directories. After a real file is linked to the text catalog through a virtual directory, the Indexing Service is notified of the new files that must be indexed, and begins the filtering and indexing of the properties and content associated with these files. The Indexing Service is also notified of any subsequent changes to these files and will re-filter and re-index the updated files.
To create a new catalog, go to the Indexing Service Console by running ciadv.msc.
- Create the catalog and specify a location where the catalog will place its index files.
- Add directories to the catalog. The files in these directories will be indexed.
You will notice that after choosing one or more directory, the Indexing Service will start indexing the files. Now, you can execute queries on the Indexing Service using either T-SQL or the .NET OLEDB Connection object.
To execute queries using T-SQL, you should first add a linked server to the Indexing Service Catalog that you created.
You can use either the Enterprise Manager or the sp_addlinkedserver
Stored Procedure to create this linked server. The provider should be set to “Microsoft OLE DB Provider for Indexing Service” or MSIDXS, and the data source should be set to the name of the catalog that you just created.
Now, you can simply run such queries:
SELECT * FROM OPENQUERY(linkedserver,
‘SELECT Directory, FileName FROM SCOPE() WHERE FileName LIKE ‘’%.doc’’’)
This query will select all the files indexed by the catalog set in the linked server.
This following query will use the full-text search CONTAINS
phrase to search for specific words:
SELECT * FROM OPENQUERY(linkedserver, ‘SELECT Directory, FileName FROM SCOPE() WHERE
FileName LIKE ‘’%.doc’’ AND CONTAINS(‘’ "white" AND "elephant" ’’) ’)
These queries can also be done using the OLEDB Connection object in .NET without using SQL Server. Simply create a new instance of the OleDbConnection
object and set its connection string property to:
Provider=MSIDXS;DataSource=CatalogName;Locale Identifier=N;
The locale identifier depends on the language used in the files. I faced problems when indexing Arabic Word files, but after setting the correct Locale Identifier, there were no problems in indexing. (I used 1043 as the locale identifier for Arabic.)
After setting the connection string, we can create an OleDbCommand
object and simply set the following query in the command text property:
SELECT Directory, FileName FROM SCOPE() WHERE FileName LIKE ’%.doc’
Here, we can use the OleDbAdapter
and a DataTable
or the OleDbReader
to read the output of the query.