Introduction
I'll start by letting you know that the basic idea and code was originally provided by Microsoft on their 'Virtual Labs' site. It's a great service although not all of the labs work; the ones that do are generally very well done. It's a life-saver when your last employer 'accidentally' formats your computer with all of your projects and only copy of Visual Studio.
You are basically remotely connecting to a virtual computer where different labs let you play on different versions of Windows, SQL Server, and Visual Studio, making it incredibly useful. It's not the "normal" lab where they force you to do every step they are teaching; if you wanted to, you can play around and test out your own code, or try different methods than what they suggest in the lab.
The only drawbacks I found are you can't save or copy the code you used to your computer or email yourself a copy of the solution. I can go on and on, but you should give it a look.
This is a simple app that gives you a pretty cool way to search through MS Word 2007 .docx files. Below is a quick requirement list for the UI if you really want to dive right into the code and avoid following every minute detail of this article.
Streamlined start for the UI:: Here are the basic requirements for the UI:
BackgroundWorker
FolderBrowserDialog
LinkButton
(that will be used to open the folder dialog)Button
to invoke the searchTextBox
to hold the query stringLabel
to display how many occurrences of the search item were foundRichTextBox
to hold the results
Take a quick look at the paragraph under "Quick Background"; it explains how and what we are searching; also take note of the using
statements we will need for this project.
Quick Background
With the release of Office 2007, we have a new file format for Word, Excel, and PowerPoint documents, called OpenXML.
To start, we need to find a Word 2007 document that has a .docx extension with a decent amount of text within it. Then, we can make a copy and re-name it with the .zip extension.
Here it starts to get pretty cool. Double-click the zip file and you should notice a few folders and files other than the document that you just saved. You will see a folder named word; within that folder, there is an XML file called document.xml: this is what we will be searching though later. You can double-click the document.xml file to view the XML in Internet Explorer.
Now, while you are looking at the document.xml file, you will notice that there are t
tags, which represent text associated with a w
tag that represents the Word namespace of the document. They should look something like: <t>Your text here bla bla bla...</t>
. These tags are there to help you parse and search through your document.
Getting Started - Creating the UI
Create a new C# Forms project and make sure to include the following:
using System.IO.Packaging;
using System.Xml;
using System.Xml.Linq;
We should put together the UI before diving into the code. First, we find and add a FolderBrowserDialog
from the Dialog section of the toolbox. Next, we will add a BackgroundWorker
to the form (which is also found in the Dialog section of the toolbox).
At the top of the form, add a LinkButton
control, set its Text
property to "Click to Select Folder", and its Name
property to linkFolderSelect
; this will open the folder browser.
Below the link button, we can add a Button
control; set its Text
property to "Search" and its Name
property to btnSearch
. We also want to make sure that its Enabled property is set to false
in the property window. We 'disable' the button because we don't want the user to click Search before they have added a file to be searched, and will help prevent unexpected Exceptions.
Next, we can add a Label
to the left of the form and set its Text
to something like "Query String". Adjacent to the Label
, we will add a TextBox
and assign its Name
property to tbSearchParam
. This will be where the user types a string that they want to search for within a document they have loaded.
Alright!! Almost done. Let's place a Label
below the Label
and TextBox
we just put-up. We can set its Text
property to something like "Results" and Name
property to lblResults
. Finally, we will add a RichTextBox
control below the "Results" label and stretch it out to the bottom of the form; this should be a decent size since it will be displaying the results from our search. Let's set the and Name
property of the RichTextBox
to tbResults
.
Now, Let's Begin to Code
Add two private strings that will be accessible throughout the project. Create a string that will hold the search parameter from the textbox we named tbSearchParam
. The next string will hold the name of the selected folder. Finally, add a private List<string>
that will contain the results of our query. You can view this code below:
namespace officeQuery
{
public partial class Form1: Form
{
private string _searchPararm;
private string _selectedFolder;
private List<string> _results;
Within the Form
constructor, we will initialize the FolderBrowserDialog
to the folder we want it to open by default when it is first clicked. I simply assigned it to the "C:\" directory, but you can just as easily assign it "C:\Documents" ...etc.
public Form1()
{
InitializeComponent();
folderBrowserDialog1.SelectedPath = @"C:\";
}
Now, we can go back to the designer and double-click the link-label to create an empty method for when it's clicked. Here is where you add the code to open the folder browser. This is pretty general code for opening a folder-browser, and can be ported to other applications. We open and show the FolderBrowserDialog
by calling ShowDialog(this)
and assigning its result to a DialogResult
variable. When the user selects a file and clicks OK, we assign its directory path to our private string variable _selectedFolder
. Now that we have a file to query, we enable the Search button, so the user can now query the document.
private void linkFolderSelect_LinkClicked(object sender,
LinkLabelLinkClickedEventArgs e)
{
DialogResult res = folderBrowserDialog1.ShowDialog(this);
if(res == DialogResult.OK)
{
_selectedFolder = folderBrowserDialog1.SelectedPath;
FileInfo fInfo = new FileInfo(_selectedFolder);
linkFolderSelect.Text = fInfo.Name;
btnSearch.Enabled = true;
}
}
Go back to the designer and double-click the search button that we named btnSearch
, this will bring us to the button's click event.
This event does a bit of work. First, we will initialize the BackgroundWorker
, invoke its event handler RunWorkerCompletedEventHandler
, and send it our method QueryComplete
. The QueryComplete
method will handle the formatting and display of our results when we are done with the query. Next, we call DoWorkEventHandler
from the BackgroundWorker
object and give it our method Query
. This handles the bulk of our processing, as we will see later.
Here is the code so far, for our buttonClick
event.
private void btnSearch_Click(object sender, EventArgs e)
{
backgroundWorker1 = new BackgroundWorker();
backgroundWorker1.RunWorkerCompleted +=
new RunWorkerCompletedEventHandler(QueryComplete);
backgroundWorker1.DoWork += DoWorkEventHandler(Query);
Let's finish up our button click event. Once we take care of the BackgroundWorker
, we should make sure to clear the RichTextBox
that will hold our results (to make sure there aren't any old search results). We will set the Search and LinkButton
's Enabled
property to false
. Retrieve the search parameter and assign it to our variable _searchParam
. Lastly, we call RunWorderAsynch()
on the BackgroundWorker
to begin the work.
tbResults.Clear();
lblResults.Text = "Searching...";
btnSearch.Enabled = false;
linkFolderSelect.Enabled = false;
_searchParam = tbSearchParam.Text.Trim();
backgroundWorker1.RunWorerAsynch();
}
I'll quickly describe the QueryComplete
method, because it's fairly simple and it does exactly what the name suggests: deals with the completed query and displays any of the found items. This is the method we gave the BackgroundWorker
object when we invoked the RunWorkerCompletedEventHandler
.
First, we will assign to our lblResults
label how many occurrences of the search item was found. Then, loop through our _results
variable using a foreach
construct. As we loop through, we will find and highlight the areas we found that correspond to the user's search parameter.
Below is the complete QueryComplete
method:
void QueryComplete(object sender, RunWorkerCompletedEventArgs e)
{
lblResults.Text = string.Format(
"Results [{0} result(s) found]", _results.Count);
foreach( string s in _results)
{
string[] result = s.Split('|');
string t = result[0];
int i = t.IndexOf( _searchParam);
while( t.IndexOf( _searchParam) > 0)
{
tbResults.AppendText(t.Substring(0, i));
tbResults.SelectionColor = Color.Red;
tbResults.AppendText( _searchParam);
tbResults.SelectionColor = Color.Black;
t = t.Substring(i + _searchParam.Length);
i = t.IndexOf( _searchParam );
}
tbResults.AppendText( t );
tbResults.SelectionColor = Color.DarkGreen;
tbResults.AppendText(string.Format(" [{.}] ", result[1]));
tbResults.AppendText(Environment.NewLine);
tbResults.SelectionColor = Color.Black;
}
btnSearch.Enabled = true;
linkFolderSelect.Enabled = true;
}
The Query
method is what we gave to the BackgroundWorker
when we called the event handler DoWorkEventHandler
.
This method creates a new List<t>
object of string
s to hold our results. Then, we loop through a DirectoryInfo
object that contains the folder that we want to search. Finally, we send the .docx files into the WordDocumentQuery
method, which is where some of the LINQ magic happens.
void Query(object sender, DoWorkEventArgs e)
{
_results = new List<string>();
DirectoryInfo dir = new DirectoryInfo(_selectedFolder);
foreach(FileInfo f in dir.GetFiles("*.docx"))
{
WordDocumentQuery(f)
}
}
We finally get to the last method of this project; the WordDocumentQuery
. This method will accept a FileInfo
object. First, we add an XNamespace
object. Next, we add an instance of the Package
class. This class allows us to access the entire contents of the file. We create a Uri
object that contains the XML file we want to search, which is the document.xml file located in the zip file. The PackagePart
object represents the contents of the URI
- as the name "docPart" suggests - it's just part of the overall package. Next, we create an XmlReader
object based on the PackagePart
that we are interested in.
So, now we have created a way to read through the XML contents of the document.xml part of the Word document zip file.
void WordDocumentQuery( FileInfo wordDocPath )
{
XNamespace wordNamespace =
"http:/schemas.openxmlformats.org/wordprocessingml/2006/main";
Package package = Package.Open(wordDocPath.FullName, FileMode.Open);
Uri uri = new Uri("/word/document.xml", UriKind.Relative);
PackagePart docPart = package.GetPart(uri);
XmlReader reader = XmlReader.Create(docPart.GetStream(FileMode.Open));
Here, we will be using LINQ to XML. First, we need to create an XElement
object and call its Load
method to load the XMLReader
that we created earlier. This is where it changes from the traditional System.Xml
to the LINQ API.
Alright, we got to the LINQ query which is pretty simple, but the syntax is a little different than a traditional SQL query. In this sample, we are looking for any of those <t>
elements (the ones containing text). We also want to filter them to only select the ones that contain our search parameter.
After creating the query, we put it in a foreach
statement and split the query out by making the results an Array
. Within the foreach
, we keep appending the results to our variable _results
. And finally, let's not forget to close the Package
object.
XElement wordDoc = XElement.Load(reader);
var query =
From c in wordDoc.Descendants(wordNamespace + "t")
Where c.Value.Contains(_searchParm)
Select c;
foreach (string s in query.ToArray())
{
string res = string.Format("{0}|{1}",
s, wordDocPath.Name);
_results.Add(res);
}
package.close();
}
}
}
Using the Code
I am hoping this is a straightforward article that anyone can pick up and write. I have added a .cs file, but was unable to debug it - so if anyone finds errors, please let me know and I'll correct them.
Points of Interest
Again, the basic idea for this program came from Microsoft's Virtual Labs, and I would strongly suggest that you do a Google search and give them a try.
History
This is the first version, but if anyone comes across errors, I'll gladly keep this article up to date and correct.