Introduction
This code demonstrates how to extract/retrieve the Document properties from a MS Word document file. Refer to my other articles for a proper background on MS Office Automation and how to setup your development environment if needed. I have created this as a utility which is being used for a much larger project in our company for document management. The night I was asked to create this utility, I also received a lot of request during that same week from readers on CodeProject to show them this exact feature! So I hope this helps demonstrate yet another way to extract/automate Word for your needs.
Background
No special background is necessary. Just have some hands on experience with C#.
Using the code
The following is a listing of the code to retrieve the document properties. Refer to Automating MS Word Using Visual Studio .NET to get started with a new project. I have provided source code that will demonstrate in a very simple fashion how to achieve this task. The following listing is the main section of the program which actually does exactly what we are looking for.
Note: The following code was written for MS Word 2003.
...
private void butSourceDocument_Click(object sender, System.EventArgs e)
{
openFileDialog.Multiselect = true;
if( openFileDialog.ShowDialog() == DialogResult.OK )
{
object vk_read_only = false;
object vk_visible = true;
object vk_false = false;
object vk_true = true;
object vk_dynamic = 2;
object vk_missing = System.Reflection.Missing.Value;
string [] properties = { "Title", "Subject", "Author",
"Keywords", "Revision Number",
"Creation Date", "Last Save Time" };
using (StreamWriter sw = new StreamWriter("FileProperties.txt"))
{
string strHeader = null;
foreach( string header in properties )
{
strHeader = strHeader + header + ", ";
}
sw.WriteLine(strHeader.Substring( 0, strHeader.Length-2 ));
foreach( string file in openFileDialog.FileNames )
{
object fileName = @file;
vk_word_app.Visible = false;
Word.Document vk_my_doc =
vk_word_app.Documents.Open( ref fileName,
ref vk_missing, ref vk_read_only,
ref vk_missing, ref vk_missing,
ref vk_missing, ref vk_missing,
ref vk_missing, ref vk_missing,
ref vk_missing, ref vk_missing,
ref vk_visible );
object vk_document_prop = vk_my_doc.BuiltInDocumentProperties;
Type propertyType = vk_document_prop.GetType( );
string strProValues = null;
foreach( string prop in properties )
{
object property = propertyType.InvokeMember( "Item",
System.Reflection.BindingFlags.Default |
System.Reflection.BindingFlags.GetProperty,
null,
vk_document_prop,
new object[ ] { prop } );
Type validatedType = property.GetType( );
string propValue = validatedType.InvokeMember( "Value",
System.Reflection.BindingFlags.Default |
System.Reflection.BindingFlags.GetProperty,
null,
property,
new object[] {} ).ToString( );
strProValues = strProValues + propValue + ", ";
}
sw.WriteLine(strProValues.Substring(0,strProValues.Length-2));
vk_my_doc.Close( ref vk_false, ref vk_missing, ref vk_missing );
}
}
vk_word_app.Quit( ref vk_false, ref vk_missing, ref vk_missing );
MessageBox.Show( "Done!" );
}
}
...
A quick summary, the program has a set of properties that it is looking for and wants to extract. In this case, they are defined as string [] properties = { "Title", "Subject", "Author", "Keywords", "Revision Number", "Creation Date", "Last Save Time" };
. The program then loops through the selected file(s) and extracts the information and stores it in a text file for further processing. If you notice, there are two for
loops in the code shown above, the first one is for the list of files to process, and the second one is for the list of the properties to extract on each file.
Points of Interest
The new version of Office, Office 2003, is going to make things a little easier for Office developers. So if you are an Office developer, you should start looking into the features that Office 2003 has to offer. One of the nice features that I like is the capability of exporting documents into XML format.