Introduction
Many of us have worked with data manipulation and data presentation. Probably, many times, your manager or your customer requested that the data reports or documents be Office compatible. They are used to working with Office, they are Office-dependent, and you cannot change this fact. They can use export features provided by reporting tools, but I guess they prefer to create their own templates and they would like to see data there. Rather than offer an unmanageable alternative to Office documents, you better "dress" their data in Office clothes. They will appreciate it.
WordML Templates Editor utility
The utility application, WordML Templates Editor, allows users to configure and fill with data an Office 2003 XML template, which is written in WordML language (the markup language used to define a MS Word document). This "template" is a simple XML file with some predefined constructs, not an MS Word ".DOT" template.
The idea comes from the MS Word feature which allows users to work with external data sources (MailMerge) and to manage metadata (mergefields). Unfortunately, a problem occurs in the MailMerge case when you want to display many records in the same document. When using the MailMerge feature directly from an MS Word application, it is difficult enough to display all the records in the data source. You must configure the main document to be as "Directory" type, and the processing repeats for all the document content for each record. But if you wish to show in the same document two or more recordsets, the situation becomes more complicated.
Suppose we want to extract from the Northwind database information about an order and its details, and the products list. Because the output will be an official document, we want to display information about the Northwind company, from a configuration file. So, we have four recordsets: two with only one record (the order and the company information), and two with many records (order details and products list).
The template can be built from an MS Word application, as well as from the utility application WordML Templates Editor, which hosts the MS Word application and has full internal access to its active document.
If you want to set the template directly from the MS Word application, you must link it to an external data source which contains the fields required by MailMerge. The easiest solution is to set as data source, a text file which contains only field definitions and a dummy row as a fake record. The disadvantage is that if you want to move the template to another machine, you have to move the data source too. For uniqueness and to distinguish between fields belonging to different tables, the field name must contain the table name and the field name in the table (concatenated). You can avoid using an external data source, by adding in the template, objects of type MergeField
. The disadvantage is that you must know the field names (built, of course, by concatenation with their table name).
If the template is built using the utility application, you don't need an external data source, but you need to use the dataset (the structure, not the data) you want to display. The utility displays the dataset tables list, and the fields corresponding to each table. You can insert any field in any text location of the active document.
Two of the the four tables have only one record. In the template, you can distinguish between elements which will be shown only once and those which will be repeated (corresponding to the number of records). For the former, the display is done by a simple insertion at the specified location. For the latter, the display is a little elaborated. Because an MS Word document doesn’t permit accessing elements by an identifier, I have chosen the bookmark insertion solution. The bookmarks used to duplicate elements have a standard name: datatable name + "WMLRepeat" token + index, where index >= 1 is used when the data table appears many times in the document. When the template is parsed, if a bookmark like this appears in the document, the analyzer tries to identify a table object and, if successful, the table row which contains at least a merge field is duplicated, and the fields are replaced with data, for each record in the data table.
When fields are replaced with data, you can perform some date and numeric formatting. The formatting information is stored in internal template variables.
Till now, only table row objects can be used as repeated elements, but if you want to use other objects, you can use a hidden table object and add text boxes, lists, formatted text, and so on.
Template creation
The template creation steps are as follows:
- create or provide datasets accessible to your users: this could be a customizable list of entries which internally provides data sources for a template editor;
- open an MS Word document, or create a new one (the editor can handle either standard DOC MS Word documents or XML MS Word documents, but only XML documents can be used as templates for data visualization);
- modify the "template" with the visual elements you need;
- choose what data tables and fields you want to insert in the "template";
- for "repeating" elements (records that appear more than once in the document), select the specified location (the row used as template row in a MS Word table object) in the active document, and insert a special bookmark (the new bookmark will be added in the "WMLRepeat" bookmarks list);
- add optionally, language name, date-time format, numeric group separator, decimal separator, decimal digits number;
- save the template as XML.
The main editor form (frmWordControl
) hosts the MS Word application in a control, as explained in the two articles specified in the "Thanks" section. The required constructs are added in the template using the MS Word DOM. It uses data sources from a dataset object or a text file (built as it is described in the previous section). The form is instantiated like this:
dsData = new DataSet();
dsData.Tables.Add(dt1);
dsData.Tables.Add(dt2);
…
frmWordControl frm = new frmWordControl(dsData);
frm.Show();
After metadata is shown in the form, you can:
- insert merge fields in a specified location:
Word.Range rng = wordCtl.Application.Selection.Range;
item = listFields.Items[fieldIndex].ToString();
if(item.Trim() != string.Empty)
{
wordCtl.Application.ActiveDocument.MailMerge.Fields.Add(rng, item);
}
- inserted special "repeat" bookmarks in a specified location:
Word.Range rng = wordCtl.Application.Selection.Range;
object oRng = (object)rng;
wordCtl.Application.ActiveDocument.Bookmarks.Add(tableName +
"WMLRepeat" + index.ToString(), ref oRng);
- specify formatting information (language, date format, decimal separator, number group separator, decimal digits number – stored in internal document variables):
object oVariableValue;
object oVariableName;
Word.Variable var;
oVariableValue = (object)languageName;
oVariableName = (object)"LanguageName";
var =
wordCtl.Application.ActiveDocument.Variables.get_Item(
ref oVariableName);
if(var != null)
var.Value = languageName;
else
wordCtl.Application.ActiveDocument.Variables.Add("LanguageName",
ref oVariableValue);
You can create new documents, open them from the disk, and save them in standard DOC or XML format.
Template visualization and the CExWordMLFiller class
The template visualization can be performed either from a Windows application, or from a web application. In both cases, the client machine should have installed Office 2003 or just the Word Viewer 2003.
The CExWordMLFiller
class is responsible for inspecting a WordML
template and filling it with the data provided by the dataset. Assume that you have a dataset dsData
and an XML document xmlTemplateDoc
, which contains the template. The code below is the required sequence to have a new XML document which mixes dataset data with visual features provided by the template:
CExWordMLFiller filler = new CExWordMLFiller(dsData, xmlTemplateDoc.OuterXml);
if(!filler.OperationFailed)
{
filler.Transform();
if(filler.OperationFailed)
{
foreach(string err in filler.ErrorList)
{
MessageBox.Show(err);
}
return;
}
}
string copyFileName = Path.GetTempFileName() + ".xml";
filler.WordMLDocument.Save(copyFileName);
The WordMLDocument
property of the filler
object contains the XML document object corresponding to the result obtained by the transformation. You can save the result in a file, or use it as a document, stream, or string.
The constructor needs two parameters: the dataset which will be displayed, and the template content as string. The content must be a valid WordML string, and it will be loaded in an XML document object. Also, the dataset will be loaded with its schema in an XML document, which will be used for removing unfilled fields at the end of the transformation. For some reasons (the corresponding data table is missing from the dataset), some fields could not be filled with data. The unfilled fields must be "removed", because the visualization file could contain "ugly" merge field definitions like "«OrderOrderID»".
The LoadTemplate
method prepares date and numeric formatting for dataset values. The formatting information is loaded from docVar
nodes (document internal variables):
XmlNode varNode = xmlTemplateDoc.SelectSingleNode("//w:docVar" +
"[@w:name='LanguageName']/@w:val", nsmgr);
if(varNode != null)
{
languageName = varNode.Value;
try
{
ci = new CultureInfo(languageName, false);
numberFormat = ci.NumberFormat;
}
catch
{
languageName = null;
numberFormat = null;
}
}
else
{
languageName = null;
}
The Transform
method performs the transformation. This method performs a loop into the dataset Tables
collection, and calls the TransformDataTable
method which checks every data table for visualization (if its rows are subject to be shown). After that, to eliminate dirty constructs, it removes the "mail merge" node (RemoveMailMergeNode
) and the unfilled fields (RemoveUnfilledFields
).
The TransformDataTable
method identifies locations where to "repeat" elements and where to show them only once. The TransformWordMLTableRepeat
method identifies the MS Word tables which display the current data table rows using the aml:annotation
WordML
element, which corresponds to a "repeat" bookmark. It takes the template row node (w:tr
) and calls the TransformDataRow
method with three parameters: data row, table node, and template row node. After all the rows are processed, the template row node is removed from the table.
string tableName = dt.TableName;
XmlNodeList oColl =
xmlTemplateDoc.SelectNodes("//w:tbl[contains(descendant::aml:a" +
"nnotation/@w:name, '" + tableName + repeatAttribute +
"') and (contains(descendant::w:instrText, ' MERGEFIELD \"" +
tableName + "') or contains(descendant::w:instrText, ' MERGEFIELD " +
tableName + "'))]", nsmgr);
XmlNode templateRowNode;
if(oColl != null && oColl.Count > 0)
{
foreach(XmlNode tableNode in oColl)
{
templateRowNode = tableNode.SelectSingleNode("w:tr[contains" +
"(descendant::w:instrText, ' MERGEFIELD \"" +
tableName + "') or contains" +
"(descendant::w:instrText," +
" ' MERGEFIELD " + tableName + "')]", nsmgr);
if(templateRowNode != null)
{
foreach(DataRow dr in dt.Rows)
{
TransformDataRow(dr, tableNode, templateRowNode);
}
tableNode.RemoveChild(templateRowNode);
}
}
}
The TransformDataRow
method checks in the current table row node where the data row fields are displayed, and replaces their values with real data. The template row node is cloned, and it is inserted before the template row node in the MS Word table object.
The ReplaceFieldData
method identifies field locations, and applies date and numeric formatting:
oColl = baseNode.SelectNodes("//w:p[w:r/w:instrText=' MERGEFIELD " +
fieldName + " ']", nsmgr);
…
foreach(XmlNode fieldNode in oColl)
{
dataNode = fieldNode.SelectSingleNode("//w:t[.='«" +
fieldName + "»']", nsmgr);
…
if(colType == typeof(DateTime))
{
if(dateTimeFormat != null)
{
DateTime dt = DateTime.Parse(data);
dataNode.InnerText = dt.ToString(dateTimeFormat);
}
else
{
dataNode.InnerText = data;
}
}
…
}
A Windows application will open the result file simply by association with its type:
System.Diagnostics.Process.Start(copyFileName);
A web application needs more code to display the output in a browser:
Response.ClearContent();
Response.ClearHeaders();
Response.Clear();
Response.ContentType = "application/msword";
Response.Charset = "";
Response.AddHeader("Content-disposition",
"inline; filename=\"" +
copyFileName + "\"");
Response.AddHeader("Content-length", fi.Length.ToString());
Response.WriteFile(copyFileName);
Response.Flush();
Response.Close();
Template visualization and the CWordDOCFiller class
As Mr. Trevor Farley recommended, I have recently added the CWordDOCFiller
class which is able to parse DOC templates. The class performs the same operations as the CExWordMLFiller
class does, but using Word DOM. Unfortunately, some performance issues appear when you access the Fields
collection, if the data to display is large (for the same Order template, the DOM version duration is 2-3 minutes, while the XML approach is almost instantly). Of course, the DOM approach is much readable and easy to maintain and it could be applied for previous versions of MS Word.
The code sequence needed to use CWordDOCFiller
class is:
string templateFileName = Application.StartupPath + @"\Templates\Order.doc";
string copyFileName = Path.GetTempFileName() + ".doc";
File.Copy(templateFileName, copyFileName, true);
CWordDOCFiller filler = new CWordDOCFiller(dsData, copyFileName);
if(!filler.OperationFailed)
{
filler.Transform();
if(filler.OperationFailed)
{
foreach(string err in filler.ErrorList)
{
MessageBox.Show(err, "Error", MessageBoxButtons.OK,
MessageBoxIcon.Error);
}
return;
}
}
else
{
foreach(string err in filler.ErrorList)
{
MessageBox.Show(err, "Error", MessageBoxButtons.OK,
MessageBoxIcon.Error);
}
}
The constructor receives two parameters: the dataset and the path of the template. To avoid altering the template, copy it in the %TEMP% folder. The class uses Word.Application
(in a private object oApp
) object to load and to modify the document. The formatting information is stored in the Word.Variables
collection, and used when the fields content is replaced with data. The main method used to apply transformation is Transform
, which applies all transforming operations over the data tables in the dataset:
bOperationFailed = false;
try
{
foreach(DataTable dt in dsData.Tables)
{
TransformTableRepeat(dt);
}
ReplaceFieldDataNoRepeat();
oApp.Visible = true;
}
catch(Exception ex)
{
while(ex != null)
{
errorList.Add(ex.Message);
ex = ex.InnerException;
}
bOperationFailed = true;
}
The TransformTableRepeat
method performs transformation for every DataRow
object, consuming the meta information from the template Word table row. The template row is copied and the copy is the subject to be filled with data (in the TransformDataRow
method). At the end, the template row is deleted.
string tableName = dt.TableName;
foreach(Word.Bookmark bmk in oTemplateDoc.Bookmarks)
{
if(bmk.Name.StartsWith(tableName + repeatAttribute))
{
Word.Table tbl = bmk.Range.Tables[1];
Word.Row row = bmk.Range.Rows[1];
for(int i = 0; i < dt.Rows.Count; i++)
{
TransformDataRow(dt.Rows[i], tbl, row, bmk.Name, i);
}
row.Delete();
}
}
The TransformDataRow
method creates new fields and add them to the specified location, ensuring that new fields names are unique in the document. As well, it completes data fulfilling for the new created fields:
DataTable dt = dr.Table;
string tableName = dt.TableName;
object oTemplateRow = (object) templateRow;
Word.Row row = tbl.Rows.Add(ref oTemplateRow);
string fieldName;
Word.Fields fields;
int fieldIndex;
Word.Field field = null;
string dataFieldName = string.Empty;
Type dataFieldType = typeof(object);
foreach(Word.Cell cell in templateRow.Cells)
{
fieldName = cell.Range.Fields[1].Code.Text.Trim();
fieldName = fieldName.Replace("MERGEFIELD ", string.Empty);
fieldName = fieldName + bmkName + rowIndex.ToString();
oTemplateDoc.MailMerge.Fields.Add(
row.Cells[cell.ColumnIndex].Range, fieldName);
fields = row.Cells[cell.ColumnIndex].Range.Fields;
fieldIndex = fields.Count;
field = fields[fieldIndex];
dataFieldName = field.Code.Text.Trim();
dataFieldName = dataFieldName.Replace("MERGEFIELD " +
dt.TableName, string.Empty);
dataFieldName = dataFieldName.Replace(bmkName +
rowIndex.ToString(), string.Empty);
dataFieldType = dt.Columns[dataFieldName].DataType;
ReplaceFieldData(field, dr[dataFieldName].ToString(),
dataFieldType);
}
The ReplaceFieldDataNoRepeat
method completes data fulfilling for "non-repeated" fields:
bOperationFailed = false;
DataRow firstRow;
foreach(Word.Field field in oTemplateDoc.Fields)
{
if(field.Code.Text.IndexOf(repeatAttribute) == -1)
{
foreach(DataTable dt in dsData.Tables)
{
if(dt.Rows.Count > 0)
{
firstRow = dt.Rows[0];
for(int j = 0; j < dt.Columns.Count; j++)
{
if(field.Code.Text.Trim() == "MERGEFIELD " +
dt.TableName + dt.Columns[j].ColumnName)
ReplaceFieldData(field, firstRow[j].ToString(),
dt.Columns[j].DataType);
}
}
}
}
}
The ReplaceFieldData
takes a reference for a Word.Field
object and sets data, using formatting information. The code used to open the file is not needed anymore, as the Application object embedded in the CWordDOCFiller
class allows showing the result of the transformation.
Using the application
The solution WordDataSetTemplateEditor.root
contains four projects:
WinWordControl
– the user control which hosts the application;
WordDataSetTemplateEditor
– the main editor project;
NorthwindDA
– the data access component for the Northwind database;
Test
– the test application for a specified order and order details visualization, and for an alphabetical list of products.
The templates Order.xml and Order.doc in the Templates folder are the needed templates for the Test application. When the project is compiled, these templates are copied into a similar folder in the application startup path.
I have remarked that if you want to update a field, the data replacement is cancelled and impossible to recover. I guess it is a good reason to eliminate just the FIELD definition. In WordML approach, this supposes to remove specific XML nodes (w:r/w:fldChar and w:r/w:instrText) for each field, while in Word DOM approach this issue is solved much more elegant using the Unlink
method of the Fields
collection (oTemplateDoc.Fields.Unlink();
).
Thanks
Many thanks to Matthias Hänel and Anup Shinde for their excellent articles (Word Control for .NET and Integrating Microsoft Word in your .NET application). Thanks to Trevor Farley for its recommendation to create the <class>CWordDOCFiller
class for Word DOM approach.